License: Creative Commons Attribution 4.0 International license (CC BY 4.0)
When quoting this document, please refer to the following
DOI: 10.4230/LIPIcs.CPM.2022.18
URN: urn:nbn:de:0030-drops-161454
URL: http://dagstuhl.sunsite.rwth-aachen.de/volltexte/2022/16145/
Go to the corresponding LIPIcs Volume Portal


Clifford, Raphaël ; Gawrychowski, Paweł ; Kociumaka, Tomasz ; Martin, Daniel P. ; Uznański, Przemysław

The Dynamic k-Mismatch Problem

pdf-format:
LIPIcs-CPM-2022-18.pdf (0.8 MB)


Abstract

The text-to-pattern Hamming distances problem asks to compute the Hamming distances between a given pattern of length m and all length-m substrings of a given text of length n ≥ m. We focus on the well-studied k-mismatch version of the problem, where a distance needs to be returned only if it does not exceed a threshold k. Moreover, we assume n ≤ 2m (in general, one can partition the text into overlapping blocks). In this work, we develop data structures for the dynamic version of the k-mismatch problem supporting two operations: An update performs a single-letter substitution in the pattern or the text, whereas a query, given an index i, returns the Hamming distance between the pattern and the text substring starting at position i, or reports that the distance exceeds k.
First, we describe a simple data structure with ?̃(1) update time and ?̃(k) query time. Through considerably more sophisticated techniques, we show that ?̃(k) update time and ?̃(1) query time is also achievable. These two solutions likely provide an essentially optimal trade-off for the dynamic k-mismatch problem with m^{Ω(1)} ≤ k ≤ √m: we prove that, in that case, conditioned on the 3SUM conjecture, one cannot simultaneously achieve k^{1-Ω(1)} time for all operations (updates and queries) after n^{?(1)}-time initialization. For k ≥ √m, the same lower bound excludes achieving m^{1/2-Ω(1)} time per operation. This is known to be essentially tight for constant-sized alphabets: already Clifford et al. (STACS 2018) achieved ?̃(√m) time per operation in that case, but their solution for large alphabets costs ?̃(m^{3/4}) time per operation. We improve and extend the latter result by developing a trade-off algorithm that, given a parameter 1 ≤ x ≤ k, achieves update time ?̃(m/k +√{mk/x}) and query time ?̃(x). In particular, for k ≥ √m, an appropriate choice of x yields ?̃(∛{mk}) time per operation, which is ?̃(m^{2/3}) when only the trivial threshold k = m is provided.

BibTeX - Entry

@InProceedings{clifford_et_al:LIPIcs.CPM.2022.18,
  author =	{Clifford, Rapha\"{e}l and Gawrychowski, Pawe{\l} and Kociumaka, Tomasz and Martin, Daniel P. and Uzna\'{n}ski, Przemys{\l}aw},
  title =	{{The Dynamic k-Mismatch Problem}},
  booktitle =	{33rd Annual Symposium on Combinatorial Pattern Matching (CPM 2022)},
  pages =	{18:1--18:15},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-234-1},
  ISSN =	{1868-8969},
  year =	{2022},
  volume =	{223},
  editor =	{Bannai, Hideo and Holub, Jan},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/opus/volltexte/2022/16145},
  URN =		{urn:nbn:de:0030-drops-161454},
  doi =		{10.4230/LIPIcs.CPM.2022.18},
  annote =	{Keywords: Pattern matching, Hamming distance, dynamic algorithms}
}

Keywords: Pattern matching, Hamming distance, dynamic algorithms
Collection: 33rd Annual Symposium on Combinatorial Pattern Matching (CPM 2022)
Issue Date: 2022
Date of publication: 22.06.2022


DROPS-Home | Fulltext Search | Imprint | Privacy Published by LZI