License: Creative Commons Attribution 3.0 Unported license (CC BY 3.0)
When quoting this document, please refer to the following
DOI: 10.4230/LIPIcs.ICALP.2018.62
URN: urn:nbn:de:0030-drops-90669
URL: http://dagstuhl.sunsite.rwth-aachen.de/volltexte/2018/9066/
Go to the corresponding LIPIcs Volume Portal


Gawrychowski, Pawel ; Uznanski, Przemyslaw

Towards Unified Approximate Pattern Matching for Hamming and L_1 Distance

pdf-format:
LIPIcs-ICALP-2018-62.pdf (0.5 MB)


Abstract

Computing the distance between a given pattern of length n and a text of length m is defined as calculating, for every m-substring of the text, the distance between the pattern and the substring. This naturally generalizes the standard notion of exact pattern matching to incorporate dissimilarity score. For both Hamming and L_{1} distance only relatively slow O~(n sqrt{m}) solutions are known for this generalization. This can be overcome by relaxing the question. For Hamming distance, the usual relaxation is to consider the k-bounded variant, where distances exceeding k are reported as infty, while for L_{1} distance asking for a (1 +/- epsilon)-approximation seems more natural. For k-bounded Hamming distance, Amir et al. [J. Algorithms 2004] showed an O~(n sqrt{k}) time algorithm, and Clifford et al. [SODA 2016] designed an O~((m+k^{2})* n/m) time solution. We provide a smooth time trade-off between these bounds by exhibiting an O~((m+k sqrt{m})* n/m) time algorithm. We complement the trade-off with a matching conditional lower bound, showing that a significantly faster combinatorial algorithm is not possible, unless the combinatorial matrix multiplication conjecture fails. We also exhibit a series of reductions that together allow us to achieve essentially the same complexity for k-bounded L_1 distance. Finally, for (1 +/- epsilon)-approximate L_1 distance, the running time of the best previously known algorithm of Lipsky and Porat [Algorithmica 2011] was O(epsilon^{-2} n). We improve this to O~(epsilon^{-1}n), thus essentially matching the complexity of the best known algorithm for (1 +/- epsilon)-approximate Hamming distance.

BibTeX - Entry

@InProceedings{gawrychowski_et_al:LIPIcs:2018:9066,
  author =	{Pawel Gawrychowski and Przemyslaw Uznanski},
  title =	{{Towards Unified Approximate Pattern Matching for Hamming and L_1 Distance}},
  booktitle =	{45th International Colloquium on Automata, Languages, and  Programming (ICALP 2018)},
  pages =	{62:1--62:13},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-076-7},
  ISSN =	{1868-8969},
  year =	{2018},
  volume =	{107},
  editor =	{Ioannis Chatzigiannakis and Christos Kaklamanis and D{\'a}niel Marx and Donald Sannella},
  publisher =	{Schloss Dagstuhl--Leibniz-Zentrum fuer Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{http://drops.dagstuhl.de/opus/volltexte/2018/9066},
  URN =		{urn:nbn:de:0030-drops-90669},
  doi =		{10.4230/LIPIcs.ICALP.2018.62},
  annote =	{Keywords: approximate pattern matching, conditional lower bounds, L_1 distance, Hamming distance}
}

Keywords: approximate pattern matching, conditional lower bounds, L_1 distance, Hamming distance
Collection: 45th International Colloquium on Automata, Languages, and Programming (ICALP 2018)
Issue Date: 2018
Date of publication: 04.07.2018


DROPS-Home | Fulltext Search | Imprint | Privacy Published by LZI