License: Creative Commons Attribution-NoDerivs 3.0 Unported license (CC BY-ND 3.0)
When quoting this document, please refer to the following
DOI: 10.4230/LIPIcs.STACS.2009.1804
URN: urn:nbn:de:0030-drops-18040
URL: http://dagstuhl.sunsite.rwth-aachen.de/volltexte/2009/1804/
Go to the corresponding LIPIcs Volume Portal


Hermelin, Danny ; Landau, Gad M. ; Landau, Shir ; Weimann, Oren

A Unified Algorithm for Accelerating Edit-Distance Computation via Text-Compression

pdf-format:
09001.HermelinDanny.1804.pdf (0.2 MB)


Abstract

The edit distance problem is a classical fundamental problem in computer science in general, and in combinatorial pattern matching in particular. The standard dynamic-programming solution for this problem computes the edit-distance between a pair of strings of total length $O(N)$ in $O(N^2)$ time. To this date, this quadratic upper-bound has never been substantially improved for general strings. However, there are known techniques for breaking this bound in case the strings are known to compress well under a particular compression scheme. The basic idea is to first compress the strings, and then to compute the edit distance between the compressed strings.

As it turns out, practically all known $o(N^2)$ edit-distance algorithms work, in some sense, under the same paradigm described above. It is therefore natural to ask whether there is a single edit-distance algorithm that works for strings which are compressed under any compression scheme. A rephrasing of this question is to ask whether a single algorithm can exploit the compressibility properties of strings under any compression method, even if each string is compressed using a different compression. In this paper we set out to answer this question by using \emph{straight-line programs}. These provide a generic platform for representing many popular compression schemes including the LZ-family, Run-Length Encoding, Byte-Pair Encoding, and dictionary methods.

For two strings of total length $N$ having straight-line program representations of total size $n$, we present an algorithm running in $O(n^{1.4}N^{1.2})$ time for computing the edit-distance of these two strings under any rational scoring function, and an $O(n^{1.34}N^{1.34})$-time algorithm for arbitrary scoring functions. This improves on a recent algorithm of Tiskin that runs in $O(nN^{1.5})$ time, and works only for rational scoring functions.

BibTeX - Entry

@InProceedings{hermelin_et_al:LIPIcs:2009:1804,
  author =	{Danny Hermelin and Gad M. Landau and Shir Landau and Oren Weimann},
  title =	{{A Unified Algorithm for Accelerating Edit-Distance Computation via Text-Compression}},
  booktitle =	{26th International Symposium on Theoretical Aspects of Computer Science},
  pages =	{529--540},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-939897-09-5},
  ISSN =	{1868-8969},
  year =	{2009},
  volume =	{3},
  editor =	{Susanne Albers and Jean-Yves Marion},
  publisher =	{Schloss Dagstuhl--Leibniz-Zentrum fuer Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{http://drops.dagstuhl.de/opus/volltexte/2009/1804},
  URN =		{urn:nbn:de:0030-drops-18040},
  doi =		{10.4230/LIPIcs.STACS.2009.1804},
  annote =	{Keywords: Edit distance, Straight-line Programs, Dynamic programming acceleration via compression, Combinatorial pattern matching}
}

Keywords: Edit distance, Straight-line Programs, Dynamic programming acceleration via compression, Combinatorial pattern matching
Collection: 26th International Symposium on Theoretical Aspects of Computer Science
Issue Date: 2009
Date of publication: 19.02.2009


DROPS-Home | Fulltext Search | Imprint | Privacy Published by LZI