License: Creative Commons Attribution 3.0 Unported license (CC BY 3.0)
When quoting this document, please refer to the following
DOI: 10.4230/LIPIcs.FSTTCS.2020.14
URN: urn:nbn:de:0030-drops-132558
URL: http://dagstuhl.sunsite.rwth-aachen.de/volltexte/2020/13255/
Go to the corresponding LIPIcs Volume Portal


Bille, Philip ; Gørtz, Inge Li ; Pedersen, Max Rishøj ; Rotenberg, Eva ; Steiner, Teresa Anna

String Indexing for Top-k Close Consecutive Occurrences

pdf-format:
LIPIcs-FSTTCS-2020-14.pdf (0.5 MB)


Abstract

The classic string indexing problem is to preprocess a string S into a compact data structure that supports efficient subsequent pattern matching queries, that is, given a pattern string P, report all occurrences of P within S. In this paper, we study a basic and natural extension of string indexing called the string indexing for top-k close consecutive occurrences problem (Sitcco). Here, a consecutive occurrence is a pair (i,j), i < j, such that P occurs at positions i and j in S and there is no occurrence of P between i and j, and their distance is defined as j-i. Given a pattern P and a parameter k, the goal is to report the top-k consecutive occurrences of P in S of minimal distance. The challenge is to compactly represent S while supporting queries in time close to the length of P and k. We give two time-space trade-offs for the problem. Let n be the length of S, m the length of P, and ε ∈ (0,1]. Our first result achieves O(nlog n) space and optimal query time of O(m+k), and our second result achieves linear space and query time O(m+k^{1+ε}). Along the way, we develop several techniques of independent interest, including a new translation of the problem into a line segment intersection problem and a new recursive clustering technique for trees.

BibTeX - Entry

@InProceedings{bille_et_al:LIPIcs:2020:13255,
  author =	{Philip Bille and Inge Li G{\o}rtz and Max Rish{\o}j Pedersen and Eva Rotenberg and Teresa Anna Steiner},
  title =	{{String Indexing for Top-k Close Consecutive Occurrences}},
  booktitle =	{40th IARCS Annual Conference on Foundations of Software Technology and Theoretical Computer Science (FSTTCS 2020)},
  pages =	{14:1--14:17},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-174-0},
  ISSN =	{1868-8969},
  year =	{2020},
  volume =	{182},
  editor =	{Nitin Saxena and Sunil Simon},
  publisher =	{Schloss Dagstuhl--Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/opus/volltexte/2020/13255},
  URN =		{urn:nbn:de:0030-drops-132558},
  doi =		{10.4230/LIPIcs.FSTTCS.2020.14},
  annote =	{Keywords: String indexing, pattern matching, consecutive occurrences}
}

Keywords: String indexing, pattern matching, consecutive occurrences
Collection: 40th IARCS Annual Conference on Foundations of Software Technology and Theoretical Computer Science (FSTTCS 2020)
Issue Date: 2020
Date of publication: 04.12.2020


DROPS-Home | Fulltext Search | Imprint | Privacy Published by LZI