License: Creative Commons Attribution 3.0 Unported license (CC BY 3.0)
When quoting this document, please refer to the following
DOI: 10.4230/LIPIcs.ICALP.2018.34
URN: urn:nbn:de:0030-drops-90383
URL: http://dagstuhl.sunsite.rwth-aachen.de/volltexte/2018/9038/
Go to the corresponding LIPIcs Volume Portal


Charikar, Moses ; Geri, Ofir ; Kim, Michael P. ; Kuszmaul, William

On Estimating Edit Distance: Alignment, Dimension Reduction, and Embeddings

pdf-format:
LIPIcs-ICALP-2018-34.pdf (0.5 MB)


Abstract

Edit distance is a fundamental measure of distance between strings and has been widely studied in computer science. While the problem of estimating edit distance has been studied extensively, the equally important question of actually producing an alignment (i.e., the sequence of edits) has received far less attention. Somewhat surprisingly, we show that any algorithm to estimate edit distance can be used in a black-box fashion to produce an approximate alignment of strings, with modest loss in approximation factor and small loss in run time. Plugging in the result of Andoni, Krauthgamer, and Onak, we obtain an alignment that is a (log n)^{O(1/epsilon^2)} approximation in time O~(n^{1 + epsilon}).
Closely related to the study of approximation algorithms is the study of metric embeddings for edit distance. We show that min-hash techniques can be useful in designing edit distance embeddings through three results: (1) An embedding from Ulam distance (edit distance over permutations) to Hamming space that matches the best known distortion of O(log n) and also implicitly encodes a sequence of edits between the strings; (2) In the case where the edit distance between the input strings is known to have an upper bound K, we show that embeddings of edit distance into Hamming space with distortion f(n) can be modified in a black-box fashion to give distortion O(f(poly(K))) for a class of periodic-free strings; (3) A randomized dimension-reduction map with contraction c and asymptotically optimal expected distortion O(c), improving on the previous O~(c^{1 + 2 / log log log n}) distortion result of Batu, Ergun, and Sahinalp.

BibTeX - Entry

@InProceedings{charikar_et_al:LIPIcs:2018:9038,
  author =	{Moses Charikar and Ofir Geri and Michael P. Kim and William Kuszmaul},
  title =	{{On Estimating Edit Distance: Alignment, Dimension Reduction, and Embeddings}},
  booktitle =	{45th International Colloquium on Automata, Languages, and  Programming (ICALP 2018)},
  pages =	{34:1--34:14},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-076-7},
  ISSN =	{1868-8969},
  year =	{2018},
  volume =	{107},
  editor =	{Ioannis Chatzigiannakis and Christos Kaklamanis and D{\'a}niel Marx and Donald Sannella},
  publisher =	{Schloss Dagstuhl--Leibniz-Zentrum fuer Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{http://drops.dagstuhl.de/opus/volltexte/2018/9038},
  URN =		{urn:nbn:de:0030-drops-90383},
  doi =		{10.4230/LIPIcs.ICALP.2018.34},
  annote =	{Keywords: edit distance, alignment, approximation algorithms, embedding, dimension reduction}
}

Keywords: edit distance, alignment, approximation algorithms, embedding, dimension reduction
Collection: 45th International Colloquium on Automata, Languages, and Programming (ICALP 2018)
Issue Date: 2018
Date of publication: 04.07.2018


DROPS-Home | Fulltext Search | Imprint | Privacy Published by LZI