License: Creative Commons Attribution 4.0 International license (CC BY 4.0)
When quoting this document, please refer to the following
DOI: 10.4230/LIPIcs.CPM.2021.10
URN: urn:nbn:de:0030-drops-139615
URL: http://dagstuhl.sunsite.rwth-aachen.de/volltexte/2021/13961/
Go to the corresponding LIPIcs Volume Portal


Bille, Philip ; Gørtz, Inge Li ; Pedersen, Max Rishøj ; Steiner, Teresa Anna

Gapped Indexing for Consecutive Occurrences

pdf-format:
LIPIcs-CPM-2021-10.pdf (1.0 MB)


Abstract

The classic string indexing problem is to preprocess a string S into a compact data structure that supports efficient pattern matching queries. Typical queries include existential queries (decide if the pattern occurs in S), reporting queries (return all positions where the pattern occurs), and counting queries (return the number of occurrences of the pattern). In this paper we consider a variant of string indexing, where the goal is to compactly represent the string such that given two patterns P₁ and P₂ and a gap range [α, β] we can quickly find the consecutive occurrences of P₁ and P₂ with distance in [α, β], i.e., pairs of subsequent occurrences with distance within the range. We present data structures that use Õ(n) space and query time Õ(|P₁|+|P₂|+n^{2/3}) for existence and counting and Õ(|P₁|+|P₂|+n^{2/3}occ^{1/3}) for reporting. We complement this with a conditional lower bound based on the set intersection problem showing that any solution using Õ(n) space must use Ω̃(|P₁| + |P₂| + √n) query time. To obtain our results we develop new techniques and ideas of independent interest including a new suffix tree decomposition and hardness of a variant of the set intersection problem.

BibTeX - Entry

@InProceedings{bille_et_al:LIPIcs.CPM.2021.10,
  author =	{Bille, Philip and G{\o}rtz, Inge Li and Pedersen, Max Rish{\o}j and Steiner, Teresa Anna},
  title =	{{Gapped Indexing for Consecutive Occurrences}},
  booktitle =	{32nd Annual Symposium on Combinatorial Pattern Matching (CPM 2021)},
  pages =	{10:1--10:19},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-186-3},
  ISSN =	{1868-8969},
  year =	{2021},
  volume =	{191},
  editor =	{Gawrychowski, Pawe{\l} and Starikovskaya, Tatiana},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/opus/volltexte/2021/13961},
  URN =		{urn:nbn:de:0030-drops-139615},
  doi =		{10.4230/LIPIcs.CPM.2021.10},
  annote =	{Keywords: String indexing, two patterns, consecutive occurrences, conditional lower bound}
}

Keywords: String indexing, two patterns, consecutive occurrences, conditional lower bound
Collection: 32nd Annual Symposium on Combinatorial Pattern Matching (CPM 2021)
Issue Date: 2021
Date of publication: 30.06.2021


DROPS-Home | Fulltext Search | Imprint | Privacy Published by LZI