License: Creative Commons Attribution 4.0 International license (CC BY 4.0)
When quoting this document, please refer to the following
DOI: 10.4230/LIPIcs.CPM.2021.10
URN: urn:nbn:de:0030-drops-139615
URL: http://dagstuhl.sunsite.rwth-aachen.de/volltexte/2021/13961/
Bille, Philip ;
Gørtz, Inge Li ;
Pedersen, Max Rishøj ;
Steiner, Teresa Anna
Gapped Indexing for Consecutive Occurrences
Abstract
The classic string indexing problem is to preprocess a string S into a compact data structure that supports efficient pattern matching queries. Typical queries include existential queries (decide if the pattern occurs in S), reporting queries (return all positions where the pattern occurs), and counting queries (return the number of occurrences of the pattern). In this paper we consider a variant of string indexing, where the goal is to compactly represent the string such that given two patterns P₁ and P₂ and a gap range [α, β] we can quickly find the consecutive occurrences of P₁ and P₂ with distance in [α, β], i.e., pairs of subsequent occurrences with distance within the range. We present data structures that use Õ(n) space and query time Õ(|P₁|+|P₂|+n^{2/3}) for existence and counting and Õ(|P₁|+|P₂|+n^{2/3}occ^{1/3}) for reporting. We complement this with a conditional lower bound based on the set intersection problem showing that any solution using Õ(n) space must use Ω̃(|P₁| + |P₂| + √n) query time. To obtain our results we develop new techniques and ideas of independent interest including a new suffix tree decomposition and hardness of a variant of the set intersection problem.
BibTeX - Entry
@InProceedings{bille_et_al:LIPIcs.CPM.2021.10,
author = {Bille, Philip and G{\o}rtz, Inge Li and Pedersen, Max Rish{\o}j and Steiner, Teresa Anna},
title = {{Gapped Indexing for Consecutive Occurrences}},
booktitle = {32nd Annual Symposium on Combinatorial Pattern Matching (CPM 2021)},
pages = {10:1--10:19},
series = {Leibniz International Proceedings in Informatics (LIPIcs)},
ISBN = {978-3-95977-186-3},
ISSN = {1868-8969},
year = {2021},
volume = {191},
editor = {Gawrychowski, Pawe{\l} and Starikovskaya, Tatiana},
publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
address = {Dagstuhl, Germany},
URL = {https://drops.dagstuhl.de/opus/volltexte/2021/13961},
URN = {urn:nbn:de:0030-drops-139615},
doi = {10.4230/LIPIcs.CPM.2021.10},
annote = {Keywords: String indexing, two patterns, consecutive occurrences, conditional lower bound}
}
Keywords: |
|
String indexing, two patterns, consecutive occurrences, conditional lower bound |
Collection: |
|
32nd Annual Symposium on Combinatorial Pattern Matching (CPM 2021) |
Issue Date: |
|
2021 |
Date of publication: |
|
30.06.2021 |