License: Creative Commons Attribution 4.0 International license (CC BY 4.0)
When quoting this document, please refer to the following
DOI: 10.4230/LIPIcs.ESA.2023.89
URN: urn:nbn:de:0030-drops-187428
Go to the corresponding LIPIcs Volume Portal

Radoszewski, Jakub

Linear Time Construction of Cover Suffix Tree and Applications

LIPIcs-ESA-2023-89.pdf (0.8 MB)


The Cover Suffix Tree (CST) of a string T is the suffix tree of T with additional explicit nodes corresponding to halves of square substrings of T. In the CST an explicit node corresponding to a substring C of T is annotated with two numbers: the number of non-overlapping consecutive occurrences of C and the total number of positions in T that are covered by occurrences of C in T. Kociumaka et al. (Algorithmica, 2015) have shown how to compute the CST of a length-n string in ?(n log n) time. We give an algorithm that computes the same data structure in ?(n) time assuming that T is over an integer alphabet and discuss its implications.
A string C is a cover of text T if occurrences of C in T cover all positions of T; C is a seed of T if occurrences and overhangs (i.e., prefix-suffix occurrences) of C in T cover all positions of T. An α-partial cover (α-partial seed) of text T is a string C whose occurrences in T (occurrences and overhangs in T, respectively) cover at least α positions of T. Kociumaka et al. (Algorithmica, 2015; Theor. Comput. Sci., 2018) have shown that knowing the CST of a length-n string T, one can compute a linear-sized representation of all seeds of T as well as all shortest α-partial covers and seeds in T for a given α in ?(n) time. Thus our result implies linear-time algorithms computing these notions of quasiperiodicity. The resulting algorithm computing seeds is substantially different from the previous one (Kociumaka et al., SODA 2012, ACM Trans. Algorithms, 2020); in particular, it is non-recursive. Kociumaka et al. (Algorithmica, 2015) proposed an ?(n log n)-time algorithm for computing a shortest α-partial cover for each α = 1,…,n; we improve this complexity to ?(n).
Our results are based on a new combinatorial characterization of consecutive overlapping occurrences of a substring S of T in terms of the set of runs (see Kolpakov and Kucherov, FOCS 1999) in T. This new insight also leads to an ?(n)-sized index for reporting overlapping consecutive occurrences of a given pattern P of length m in the optimal ?(m+output) time, where output is the number of occurrences reported. In comparison, a general index for reporting bounded-gap consecutive occurrences of Navarro and Thankachan (Theor. Comput. Sci., 2016) uses ?(n log n) space.

BibTeX - Entry

  author =	{Radoszewski, Jakub},
  title =	{{Linear Time Construction of Cover Suffix Tree and Applications}},
  booktitle =	{31st Annual European Symposium on Algorithms (ESA 2023)},
  pages =	{89:1--89:17},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-295-2},
  ISSN =	{1868-8969},
  year =	{2023},
  volume =	{274},
  editor =	{G{\o}rtz, Inge Li and Farach-Colton, Martin and Puglisi, Simon J. and Herman, Grzegorz},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{},
  URN =		{urn:nbn:de:0030-drops-187428},
  doi =		{10.4230/LIPIcs.ESA.2023.89},
  annote =	{Keywords: cover (quasiperiod), seed, suffix tree, run (maximal repetition)}

Keywords: cover (quasiperiod), seed, suffix tree, run (maximal repetition)
Collection: 31st Annual European Symposium on Algorithms (ESA 2023)
Issue Date: 2023
Date of publication: 30.08.2023

DROPS-Home | Fulltext Search | Imprint | Privacy Published by LZI