License: Creative Commons Attribution 4.0 International license (CC BY 4.0)
When quoting this document, please refer to the following
DOI: 10.4230/LIPIcs.ESA.2023.89
URN: urn:nbn:de:0030-drops-187428
URL: http://dagstuhl.sunsite.rwth-aachen.de/volltexte/2023/18742/
Radoszewski, Jakub
Linear Time Construction of Cover Suffix Tree and Applications
Abstract
The Cover Suffix Tree (CST) of a string T is the suffix tree of T with additional explicit nodes corresponding to halves of square substrings of T. In the CST an explicit node corresponding to a substring C of T is annotated with two numbers: the number of non-overlapping consecutive occurrences of C and the total number of positions in T that are covered by occurrences of C in T. Kociumaka et al. (Algorithmica, 2015) have shown how to compute the CST of a length-n string in ?(n log n) time. We give an algorithm that computes the same data structure in ?(n) time assuming that T is over an integer alphabet and discuss its implications.
A string C is a cover of text T if occurrences of C in T cover all positions of T; C is a seed of T if occurrences and overhangs (i.e., prefix-suffix occurrences) of C in T cover all positions of T. An α-partial cover (α-partial seed) of text T is a string C whose occurrences in T (occurrences and overhangs in T, respectively) cover at least α positions of T. Kociumaka et al. (Algorithmica, 2015; Theor. Comput. Sci., 2018) have shown that knowing the CST of a length-n string T, one can compute a linear-sized representation of all seeds of T as well as all shortest α-partial covers and seeds in T for a given α in ?(n) time. Thus our result implies linear-time algorithms computing these notions of quasiperiodicity. The resulting algorithm computing seeds is substantially different from the previous one (Kociumaka et al., SODA 2012, ACM Trans. Algorithms, 2020); in particular, it is non-recursive. Kociumaka et al. (Algorithmica, 2015) proposed an ?(n log n)-time algorithm for computing a shortest α-partial cover for each α = 1,…,n; we improve this complexity to ?(n).
Our results are based on a new combinatorial characterization of consecutive overlapping occurrences of a substring S of T in terms of the set of runs (see Kolpakov and Kucherov, FOCS 1999) in T. This new insight also leads to an ?(n)-sized index for reporting overlapping consecutive occurrences of a given pattern P of length m in the optimal ?(m+output) time, where output is the number of occurrences reported. In comparison, a general index for reporting bounded-gap consecutive occurrences of Navarro and Thankachan (Theor. Comput. Sci., 2016) uses ?(n log n) space.
BibTeX - Entry
@InProceedings{radoszewski:LIPIcs.ESA.2023.89,
author = {Radoszewski, Jakub},
title = {{Linear Time Construction of Cover Suffix Tree and Applications}},
booktitle = {31st Annual European Symposium on Algorithms (ESA 2023)},
pages = {89:1--89:17},
series = {Leibniz International Proceedings in Informatics (LIPIcs)},
ISBN = {978-3-95977-295-2},
ISSN = {1868-8969},
year = {2023},
volume = {274},
editor = {G{\o}rtz, Inge Li and Farach-Colton, Martin and Puglisi, Simon J. and Herman, Grzegorz},
publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
address = {Dagstuhl, Germany},
URL = {https://drops.dagstuhl.de/opus/volltexte/2023/18742},
URN = {urn:nbn:de:0030-drops-187428},
doi = {10.4230/LIPIcs.ESA.2023.89},
annote = {Keywords: cover (quasiperiod), seed, suffix tree, run (maximal repetition)}
}
Keywords: |
|
cover (quasiperiod), seed, suffix tree, run (maximal repetition) |
Collection: |
|
31st Annual European Symposium on Algorithms (ESA 2023) |
Issue Date: |
|
2023 |
Date of publication: |
|
30.08.2023 |