License: Creative Commons Attribution 4.0 International license (CC BY 4.0)
When quoting this document, please refer to the following
DOI: 10.4230/LIPIcs.CPM.2023.24
URN: urn:nbn:de:0030-drops-179787
URL: http://dagstuhl.sunsite.rwth-aachen.de/volltexte/2023/17978/
Go to the corresponding LIPIcs Volume Portal


Navarro, Gonzalo

Computing MEMs on Repetitive Text Collections

pdf-format:
LIPIcs-CPM-2023-24.pdf (0.7 MB)


Abstract

We consider the problem of computing the Maximal Exact Matches (MEMs) of a given pattern P[1..m] on a large repetitive text collection T[1..n], which is represented as a (hopefully much smaller) run-length context-free grammar of size g_{rl}. We show that the problem can be solved in time O(m² log^ε n), for any constant ε > 0, on a data structure of size O(g_{rl}). Further, on a locally consistent grammar of size O(δ log n/δ), the time decreases to O(m log m(log m + log^ε n)). The value δ is a function of the substring complexity of T and Ω(δ log n/δ) is a tight lower bound on the compressibility of repetitive texts T, so our structure has optimal size in terms of n and δ.

BibTeX - Entry

@InProceedings{navarro:LIPIcs.CPM.2023.24,
  author =	{Navarro, Gonzalo},
  title =	{{Computing MEMs on Repetitive Text Collections}},
  booktitle =	{34th Annual Symposium on Combinatorial Pattern Matching (CPM 2023)},
  pages =	{24:1--24:17},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-276-1},
  ISSN =	{1868-8969},
  year =	{2023},
  volume =	{259},
  editor =	{Bulteau, Laurent and Lipt\'{a}k, Zsuzsanna},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/opus/volltexte/2023/17978},
  URN =		{urn:nbn:de:0030-drops-179787},
  doi =		{10.4230/LIPIcs.CPM.2023.24},
  annote =	{Keywords: grammar-based indices, maximal exact matches, locally consistent grammars, substring complexity}
}

Keywords: grammar-based indices, maximal exact matches, locally consistent grammars, substring complexity
Collection: 34th Annual Symposium on Combinatorial Pattern Matching (CPM 2023)
Issue Date: 2023
Date of publication: 21.06.2023


DROPS-Home | Fulltext Search | Imprint | Privacy Published by LZI