The beta version of DROPS 2 is now publicly available! Check this page out at DROPS 2 now!



License: Creative Commons Attribution 3.0 Unported license (CC BY 3.0)
When quoting this document, please refer to the following
DOI: 10.4230/LIPIcs.ISAAC.2019.22
URN: urn:nbn:de:0030-drops-115182
URL: http://dagstuhl.sunsite.rwth-aachen.de/volltexte/2019/11518/
Go to the corresponding LIPIcs Volume Portal


Charalampopoulos, Panagiotis ; Kociumaka, Tomasz ; Mohamed, Manal ; Radoszewski, Jakub ; Rytter, Wojciech ; Walen, Tomasz

Internal Dictionary Matching

pdf-format:
LIPIcs-ISAAC-2019-22.pdf (0.6 MB)


Abstract

We introduce data structures answering queries concerning the occurrences of patterns from a given dictionary D in fragments of a given string T of length n. The dictionary is internal in the sense that each pattern in D is given as a fragment of T. This way, D takes space proportional to the number of patterns d=|D| rather than their total length, which could be Theta(n * d).
In particular, we consider the following types of queries: reporting and counting all occurrences of patterns from D in a fragment T[i..j] (operations Report(i,j) and Count(i,j) below, as well as operation Exists(i,j) that returns true iff Count(i,j)>0) and reporting distinct patterns from D that occur in T[i..j] (operation ReportDistinct(i,j)). We show how to construct, in O((n+d) log^{O(1)} n) time, a data structure that answers each of these queries in time O(log^{O(1)} n+|output|) - see the table below for specific time and space complexities.
Query | Preprocessing time | Space | Query time
Exists(i,j) | O(n+d) | O(n) | O(1)
Report(i,j) | O(n+d) | O(n+d) | O(1+|output|)
ReportDistinct(i,j) | O(n log n+d) | O(n+d) | O(log n+|output|)
Count(i,j) | O({n log n}/{log log n} + d log^{3/2} n) | O(n+d log n) | O({log^2n}/{log log n})
The case of counting patterns is much more involved and needs a combination of a locally consistent parsing with orthogonal range searching. Reporting distinct patterns, on the other hand, uses the structure of maximal repetitions in strings. Finally, we provide tight - up to subpolynomial factors - upper and lower bounds for the case of a dynamic dictionary.

BibTeX - Entry

@InProceedings{charalampopoulos_et_al:LIPIcs:2019:11518,
  author =	{Panagiotis Charalampopoulos and Tomasz Kociumaka and Manal Mohamed and Jakub Radoszewski and Wojciech Rytter and Tomasz Walen},
  title =	{{Internal Dictionary Matching}},
  booktitle =	{30th International Symposium on Algorithms and Computation (ISAAC 2019)},
  pages =	{22:1--22:17},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-130-6},
  ISSN =	{1868-8969},
  year =	{2019},
  volume =	{149},
  editor =	{Pinyan Lu and Guochuan Zhang},
  publisher =	{Schloss Dagstuhl--Leibniz-Zentrum fuer Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/opus/volltexte/2019/11518},
  URN =		{urn:nbn:de:0030-drops-115182},
  doi =		{10.4230/LIPIcs.ISAAC.2019.22},
  annote =	{Keywords: string algorithms, dictionary matching, internal pattern matching}
}

Keywords: string algorithms, dictionary matching, internal pattern matching
Collection: 30th International Symposium on Algorithms and Computation (ISAAC 2019)
Issue Date: 2019
Date of publication: 28.11.2019


DROPS-Home | Fulltext Search | Imprint | Privacy Published by LZI