License: Creative Commons Attribution 3.0 Unported license (CC BY 3.0)
When quoting this document, please refer to the following
DOI: 10.4230/LIPIcs.CPM.2018.13
URN: urn:nbn:de:0030-drops-86965
URL: http://dagstuhl.sunsite.rwth-aachen.de/volltexte/2018/8696/
Brubach, Brian ;
Ghurye, Jay
A Succinct Four Russians Speedup for Edit Distance Computation and One-against-many Banded Alignment
Abstract
The classical Four Russians speedup for computing edit distance (a.k.a. Levenshtein distance), due to Masek and Paterson [Masek and Paterson, 1980], involves partitioning the dynamic programming table into k-by-k square blocks and generating a lookup table in O(psi^{2k} k^2 |Sigma|^{2k}) time and O(psi^{2k} k |Sigma|^{2k}) space for block size k, where psi depends on the cost function (for unit costs psi = 3) and |Sigma| is the size of the alphabet. We show that the O(psi^{2k} k^2) and O(psi^{2k} k) factors can be improved to O(k^2 lg{k}) time and O(k^2) space. Thus, we improve the time and space complexity of that aspect compared to Masek and Paterson [Masek and Paterson, 1980] and remove the dependence on psi.
We further show that for certain problems the O(|Sigma|^{2k}) factor can also be reduced. Using this technique, we show a new algorithm for the fundamental problem of one-against-many banded alignment. In particular, comparing one string of length m to n other strings of length m with maximum distance d can be performed in O(n m + m d^2 lg{d} + n d^3) time. When d is reasonably small, this approaches or meets the current best theoretic result of O(nm + n d^2) achieved by using the best known pairwise algorithm running in O(m + d^2) time [Myers, 1986][Ukkonen, 1985] while potentially being more practical. It also improves on the standard practical approach which requires O(n m d) time to iteratively run an O(md) time pairwise banded alignment algorithm.
Regarding pairwise comparison, we extend the classic result of Masek and Paterson [Masek and Paterson, 1980] which computes the edit distance between two strings in O(m^2/log{m}) time to remove the dependence on psi even when edits have arbitrary costs from a penalty matrix. Crochemore, Landau, and Ziv-Ukelson [Crochemore, 2003] achieved a similar result, also allowing for unrestricted scoring matrices, but with variable-sized blocks. In practical applications of the Four Russians speedup wherein space efficiency is important and smaller block sizes k are used (notably k < |Sigma|), Kim, Na, Park, and Sim [Kim et al., 2016] showed how to remove the dependence on the alphabet size for the unit cost version, generating a lookup table in O(3^{2k} (2k)! k^2) time and O(3^{2k} (2k)! k) space. Combining their work with our result yields an improvement to O((2k)! k^2 lg{k}) time and O((2k)! k^2) space.
BibTeX - Entry
@InProceedings{brubach_et_al:LIPIcs:2018:8696,
author = {Brian Brubach and Jay Ghurye},
title = {{A Succinct Four Russians Speedup for Edit Distance Computation and One-against-many Banded Alignment}},
booktitle = {Annual Symposium on Combinatorial Pattern Matching (CPM 2018)},
pages = {13:1--13:12},
series = {Leibniz International Proceedings in Informatics (LIPIcs)},
ISBN = {978-3-95977-074-3},
ISSN = {1868-8969},
year = {2018},
volume = {105},
editor = {Gonzalo Navarro and David Sankoff and Binhai Zhu},
publisher = {Schloss Dagstuhl--Leibniz-Zentrum fuer Informatik},
address = {Dagstuhl, Germany},
URL = {https://drops.dagstuhl.de/opus/volltexte/2018/8696},
URN = {urn:nbn:de:0030-drops-86965},
doi = {10.4230/LIPIcs.CPM.2018.13},
annote = {Keywords: edit distance, banded alignment, one-against-many alignment, genomics, method of the Four Russians}
}
Keywords: |
|
edit distance, banded alignment, one-against-many alignment, genomics, method of the Four Russians |
Collection: |
|
Annual Symposium on Combinatorial Pattern Matching (CPM 2018) |
Issue Date: |
|
2018 |
Date of publication: |
|
18.05.2018 |