License: Creative Commons Attribution 3.0 Unported license (CC BY 3.0)
When quoting this document, please refer to the following
DOI: 10.4230/LIPIcs.STACS.2020.18
URN: urn:nbn:de:0030-drops-118798
URL: http://dagstuhl.sunsite.rwth-aachen.de/volltexte/2020/11879/
Go to the corresponding LIPIcs Volume Portal


Dudek, Bartłomiej ; Gawrychowski, Paweł ; Starikovskaya, Tatiana

Generalised Pattern Matching Revisited

pdf-format:
LIPIcs-STACS-2020-18.pdf (0.6 MB)


Abstract

In the problem of Generalised Pattern Matching (GPM) [STOC'94, Muthukrishnan and Palem], we are given a text T of length n over an alphabet Σ_T, a pattern P of length m over an alphabet Σ_P, and a matching relationship ⊆ Σ_T × Σ_P, and must return all substrings of T that match P (reporting) or the number of mismatches between each substring of T of length m and P (counting). In this work, we improve over all previously known algorithms for this problem:
- For ? being the maximum number of characters that match a fixed character, we show two new Monte Carlo algorithms, a reporting algorithm with time ?(? n log n log m) and a (1-ε)-approximation counting algorithm with time ?(ε^-1 ? n log n log m). We then derive a (1-ε)-approximation deterministic counting algorithm for GPM with ?(ε^-2 ? n log⁶ n) time.
- For ? being the number of pairs of matching characters, we demonstrate Monte Carlo algorithms for reporting and (1-ε)-approximate counting with running time ?(√? n log m √{log n}) and ?(√{ε^-1 ?} n log m √{log n}), respectively, as well as a (1-ε)-approximation deterministic algorithm for the counting variant of GPM with ?(ε^-1 √{?} n log^{7/2} n) time.
- Finally, for ℐ being the total number of disjoint intervals of characters that match the m characters of the pattern P, we show that both the reporting and the counting variants of GPM can be solved exactly and deterministically in ?(n√{ℐ log m} +n log n) time.
At the heart of our new deterministic upper bounds for ? and ? lies a faster construction of superimposed codes, which solves an open problem posed in [FOCS'97, Indyk] and can be of independent interest.
To conclude, we demonstrate first lower bounds for GPM. We start by showing that any deterministic or Monte Carlo algorithm for GPM must use Ω(?) time, and then proceed to show higher lower bounds for combinatorial algorithms. These bounds show that our algorithms are almost optimal, unless a radically new approach is developed.

BibTeX - Entry

@InProceedings{dudek_et_al:LIPIcs:2020:11879,
  author =	{Bartłomiej Dudek and Paweł Gawrychowski and Tatiana Starikovskaya},
  title =	{{Generalised Pattern Matching Revisited}},
  booktitle =	{37th International Symposium on Theoretical Aspects of Computer Science (STACS 2020)},
  pages =	{18:1--18:18},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-140-5},
  ISSN =	{1868-8969},
  year =	{2020},
  volume =	{154},
  editor =	{Christophe Paul and Markus Bl{\"a}ser},
  publisher =	{Schloss Dagstuhl--Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/opus/volltexte/2020/11879},
  URN =		{urn:nbn:de:0030-drops-118798},
  doi =		{10.4230/LIPIcs.STACS.2020.18},
  annote =	{Keywords: pattern matching, superimposed codes, conditional lower bounds}
}

Keywords: pattern matching, superimposed codes, conditional lower bounds
Collection: 37th International Symposium on Theoretical Aspects of Computer Science (STACS 2020)
Issue Date: 2020
Date of publication: 04.03.2020


DROPS-Home | Fulltext Search | Imprint | Privacy Published by LZI