License: Creative Commons Attribution 4.0 International license (CC BY 4.0)
When quoting this document, please refer to the following
DOI: 10.4230/LIPIcs.CPM.2022.27
URN: urn:nbn:de:0030-drops-161545
URL: http://dagstuhl.sunsite.rwth-aachen.de/volltexte/2022/16154/
Akagi, Tooru ;
Okabe, Kouta ;
Mieno, Takuya ;
Nakashima, Yuto ;
Inenaga, Shunsuke
Minimal Absent Words on Run-Length Encoded Strings
Abstract
A string w is called a minimal absent word for another string T if w does not occur (as a substring) in T and all proper substrings of w occur in T. State-of-the-art data structures for reporting the set MAW(T) of MAWs from a given string T of length n require O(n) space, can be built in O(n) time, and can report all MAWs in O(|MAW(T)|) time upon a query. This paper initiates the problem of computing MAWs from a compressed representation of a string. In particular, we focus on the most basic compressed representation of a string, run-length encoding (RLE), which represents each maximal run of the same characters a by a^p where p is the length of the run. Let m be the RLE-size of string T. After categorizing the MAWs into five disjoint sets ℳ₁, ℳ₂, ℳ₃, ℳ₄, ℳ₅ using RLE, we present matching upper and lower bounds for the number of MAWs in ℳ_i for i = 1,2,4,5 in terms of RLE-size m, except for ℳ₃ whose size is unbounded by m. We then present a compact O(m)-space data structure that can report all MAWs in optimal O(|MAW(T)|) time.
BibTeX - Entry
@InProceedings{akagi_et_al:LIPIcs.CPM.2022.27,
author = {Akagi, Tooru and Okabe, Kouta and Mieno, Takuya and Nakashima, Yuto and Inenaga, Shunsuke},
title = {{Minimal Absent Words on Run-Length Encoded Strings}},
booktitle = {33rd Annual Symposium on Combinatorial Pattern Matching (CPM 2022)},
pages = {27:1--27:17},
series = {Leibniz International Proceedings in Informatics (LIPIcs)},
ISBN = {978-3-95977-234-1},
ISSN = {1868-8969},
year = {2022},
volume = {223},
editor = {Bannai, Hideo and Holub, Jan},
publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
address = {Dagstuhl, Germany},
URL = {https://drops.dagstuhl.de/opus/volltexte/2022/16154},
URN = {urn:nbn:de:0030-drops-161545},
doi = {10.4230/LIPIcs.CPM.2022.27},
annote = {Keywords: string algorithms, combinatorics on words, minimal absent words, run-length encoding}
}
Keywords: |
|
string algorithms, combinatorics on words, minimal absent words, run-length encoding |
Collection: |
|
33rd Annual Symposium on Combinatorial Pattern Matching (CPM 2022) |
Issue Date: |
|
2022 |
Date of publication: |
|
22.06.2022 |