License: Creative Commons Attribution 3.0 Unported license (CC BY 3.0)
When quoting this document, please refer to the following
DOI: 10.4230/LIPIcs.ESA.2020.15
URN: urn:nbn:de:0030-drops-128819
URL: http://dagstuhl.sunsite.rwth-aachen.de/volltexte/2020/12881/
Go to the corresponding LIPIcs Volume Portal


Bentley, Jason W. ; Gibney, Daniel ; Thankachan, Sharma V.

On the Complexity of BWT-Runs Minimization via Alphabet Reordering

pdf-format:
LIPIcs-ESA-2020-15.pdf (0.5 MB)


Abstract

The Burrows-Wheeler Transform (BWT) has been an essential tool in text compression and indexing. First introduced in 1994, it went on to provide the backbone for the first encoding of the classic suffix tree data structure in space close to entropy-based lower bound. Within the last decade, it has seen its role further enhanced with the development of compact suffix trees in space proportional to "r", the number of runs in the BWT. While r would superficially appear to be only a measure of space complexity, it is actually appearing increasingly often in the time complexity of new algorithms as well. This makes having the smallest value of r of growing importance. Interestingly, unlike other popular measures of compression, the parameter r is sensitive to the lexicographic ordering given to the text’s alphabet. Despite several past attempts to exploit this fact, a provably efficient algorithm for finding, or approximating, an alphabet ordering which minimizes r has been open for years.
We help to explain this lack of progress by presenting the first set of results on the computational complexity of minimizing BWT-runs via alphabet reordering. We prove that the decision version of this problem is NP-complete and cannot be solved in time poly(n)⋅ 2^o(σ) unless the Exponential Time Hypothesis fails, where σ is the size of the alphabet and n is the length of the text. Moreover, we show that the optimization variant is APX-hard. In doing so, we relate two previously disparate topics: the optimal traveling salesperson path of a graph and the number of runs in the BWT of a text. In addition, by relating recent results in the field of dictionary compression, we illustrate that an arbitrary alphabet ordering provides an O(log² n)-approximation. Lastly, we provide an optimal linear-time algorithm for a more restricted problem of finding an optimal ordering on a subset of symbols (occurring only once) under ordering constraints.

BibTeX - Entry

@InProceedings{bentley_et_al:LIPIcs:2020:12881,
  author =	{Jason W. Bentley and Daniel Gibney and Sharma V. Thankachan},
  title =	{{On the Complexity of BWT-Runs Minimization via Alphabet Reordering}},
  booktitle =	{28th Annual European Symposium on Algorithms (ESA 2020)},
  pages =	{15:1--15:13},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-162-7},
  ISSN =	{1868-8969},
  year =	{2020},
  volume =	{173},
  editor =	{Fabrizio Grandoni and Grzegorz Herman and Peter Sanders},
  publisher =	{Schloss Dagstuhl--Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/opus/volltexte/2020/12881},
  URN =		{urn:nbn:de:0030-drops-128819},
  doi =		{10.4230/LIPIcs.ESA.2020.15},
  annote =	{Keywords: BWT, NP-hardness, APX-hardness}
}

Keywords: BWT, NP-hardness, APX-hardness
Collection: 28th Annual European Symposium on Algorithms (ESA 2020)
Issue Date: 2020
Date of publication: 26.08.2020


DROPS-Home | Fulltext Search | Imprint | Privacy Published by LZI