License: Creative Commons Attribution 4.0 International license (CC BY 4.0)
When quoting this document, please refer to the following
DOI: 10.4230/LIPIcs.ICALP.2022.99
URN: urn:nbn:de:0030-drops-164403
URL: http://dagstuhl.sunsite.rwth-aachen.de/volltexte/2022/16440/
Nishimoto, Takaaki ;
Kanda, Shunsuke ;
Tabei, Yasuo
An Optimal-Time RLBWT Construction in BWT-Runs Bounded Space
Abstract
The compression of highly repetitive strings (i.e., strings with many repetitions) has been a central research topic in string processing, and quite a few compression methods for these strings have been proposed thus far. Among them, an efficient compression format gathering increasing attention is the run-length Burrows-Wheeler transform (RLBWT), which is a run-length encoded BWT as a reversible permutation of an input string on the lexicographical order of suffixes. State-of-the-art construction algorithms of RLBWT have a serious issue with respect to (i) non-optimal computation time or (ii) a working space that is linearly proportional to the length of an input string. In this paper, we present r-comp, the first optimal-time construction algorithm of RLBWT in BWT-runs bounded space. That is, the computational complexity of r-comp is O(n + r log r) time and O(r log n) bits of working space for the length n of an input string and the number r of equal-letter runs in BWT. The computation time is optimal (i.e., O(n)) for strings with the property r = O(n/log n), which holds for most highly repetitive strings. Experiments using a real-world dataset of highly repetitive strings show the effectiveness of r-comp with respect to computation time and space.
BibTeX - Entry
@InProceedings{nishimoto_et_al:LIPIcs.ICALP.2022.99,
author = {Nishimoto, Takaaki and Kanda, Shunsuke and Tabei, Yasuo},
title = {{An Optimal-Time RLBWT Construction in BWT-Runs Bounded Space}},
booktitle = {49th International Colloquium on Automata, Languages, and Programming (ICALP 2022)},
pages = {99:1--99:20},
series = {Leibniz International Proceedings in Informatics (LIPIcs)},
ISBN = {978-3-95977-235-8},
ISSN = {1868-8969},
year = {2022},
volume = {229},
editor = {Boja\'{n}czyk, Miko{\l}aj and Merelli, Emanuela and Woodruff, David P.},
publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
address = {Dagstuhl, Germany},
URL = {https://drops.dagstuhl.de/opus/volltexte/2022/16440},
URN = {urn:nbn:de:0030-drops-164403},
doi = {10.4230/LIPIcs.ICALP.2022.99},
annote = {Keywords: lossless data compression, Burrows-Wheeler transform, highly repetitive text collections}
}
Keywords: |
|
lossless data compression, Burrows-Wheeler transform, highly repetitive text collections |
Collection: |
|
49th International Colloquium on Automata, Languages, and Programming (ICALP 2022) |
Issue Date: |
|
2022 |
Date of publication: |
|
28.06.2022 |
Supplementary Material: |
|
Software (Source Code): https://github.com/kampersanda/rcomp |