title: Tokenisation is NPComplete

publish date:

2024-12-19

authors:

Philip Whittington et.al.

paper id

2412.15210v1

download

abstracts:

In this work, we prove the NP-completeness of two variants of tokenisation, defined as the problem of compressing a dataset to at most $\delta$ symbols by either finding a vocabulary directly (direct tokenisation), or selecting a sequence of merge operations (bottom-up tokenisation).

QA:

coming soon

编辑整理: wanghaisheng 更新日期:2024 年 12 月 23 日