License: Creative Commons Attribution 4.0 International license (CC BY 4.0)
When quoting this document, please refer to the following
DOI: 10.4230/LIPIcs.ITCS.2023.58
URN: urn:nbn:de:0030-drops-175615
URL: http://dagstuhl.sunsite.rwth-aachen.de/volltexte/2023/17561/
Go to the corresponding LIPIcs Volume Portal


Goldenberg, Elazar ; Kociumaka, Tomasz ; Krauthgamer, Robert ; Saha, Barna

An Algorithmic Bridge Between Hamming and Levenshtein Distances

pdf-format:
LIPIcs-ITCS-2023-58.pdf (1.0 MB)


Abstract

The edit distance between strings classically assigns unit cost to every character insertion, deletion, and substitution, whereas the Hamming distance only allows substitutions. In many real-life scenarios, insertions and deletions (abbreviated indels) appear frequently but significantly less so than substitutions. To model this, we consider substitutions being cheaper than indels, with cost 1/a for a parameter a ≥ 1. This basic variant, denoted ED_a, bridges classical edit distance (a = 1) with Hamming distance (a → ∞), leading to interesting algorithmic challenges: Does the time complexity of computing ED_a interpolate between that of Hamming distance (linear time) and edit distance (quadratic time)? What about approximating ED_a?
We first present a simple deterministic exact algorithm for ED_a and further prove that it is near-optimal assuming the Orthogonal Vectors Conjecture. Our main result is a randomized algorithm computing a (1+ε)-approximation of ED_a(X,Y), given strings X,Y of total length n and a bound k ≥ ED_a(X,Y). For simplicity, let us focus on k ≥ 1 and a constant ε > 0; then, our algorithm takes Õ(n/a + ak³) time. Unless a = Õ(1), in which case ED_a resembles the standard edit distance, and for the most interesting regime of small enough k, this running time is sublinear in n.
We also consider a very natural version that asks to find a (k_I, k_S)-alignment, i.e., an alignment with at most k_I indels and k_S substitutions. In this setting, we give an exact algorithm and, more importantly, an Õ((nk_I)/k_S + k_S k_I³)-time (1,1+ε)-bicriteria approximation algorithm. The latter solution is based on the techniques we develop for ED_a for a = Θ(k_S/k_I), and its running time is again sublinear in n whenever k_I ≪ k_S and the overall distance is small enough.
These bounds are in stark contrast to unit-cost edit distance, where state-of-the-art algorithms are far from achieving (1+ε)-approximation in sublinear time, even for a favorable choice of k.

BibTeX - Entry

@InProceedings{goldenberg_et_al:LIPIcs.ITCS.2023.58,
  author =	{Goldenberg, Elazar and Kociumaka, Tomasz and Krauthgamer, Robert and Saha, Barna},
  title =	{{An Algorithmic Bridge Between Hamming and Levenshtein Distances}},
  booktitle =	{14th Innovations in Theoretical Computer Science Conference (ITCS 2023)},
  pages =	{58:1--58:23},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-263-1},
  ISSN =	{1868-8969},
  year =	{2023},
  volume =	{251},
  editor =	{Tauman Kalai, Yael},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/opus/volltexte/2023/17561},
  URN =		{urn:nbn:de:0030-drops-175615},
  doi =		{10.4230/LIPIcs.ITCS.2023.58},
  annote =	{Keywords: edit distance, Hamming distance, Longest Common Extension queries}
}

Keywords: edit distance, Hamming distance, Longest Common Extension queries
Collection: 14th Innovations in Theoretical Computer Science Conference (ITCS 2023)
Issue Date: 2023
Date of publication: 01.02.2023


DROPS-Home | Fulltext Search | Imprint | Privacy Published by LZI