License: Creative Commons Attribution 3.0 Unported license (CC BY 3.0)
When quoting this document, please refer to the following
DOI: 10.4230/LIPIcs.CPM.2020.10
URN: urn:nbn:de:0030-drops-121359
URL: http://dagstuhl.sunsite.rwth-aachen.de/volltexte/2020/12135/
Charalampopoulos, Panagiotis ;
Pissis, Solon P. ;
Radoszewski, Jakub ;
Waleń, Tomasz ;
Zuba, Wiktor
Unary Words Have the Smallest Levenshtein k-Neighbourhoods
Abstract
The edit distance (a.k.a. the Levenshtein distance) between two words is defined as the minimum number of insertions, deletions or substitutions of letters needed to transform one word into another. The Levenshtein k-neighbourhood of a word w is the set of words that are at edit distance at most k from w. This is perhaps the most important concept underlying BLAST, a widely-used tool for comparing biological sequences. A natural combinatorial question is to ask for upper and lower bounds on the size of this set. The answer to this question has important algorithmic implications as well. Myers notes that "such bounds would give a tighter characterisation of the running time of the algorithm" behind BLAST. We show that the size of the Levenshtein k-neighbourhood of any word of length n over an arbitrary alphabet is not smaller than the size of the Levenshtein k-neighbourhood of a unary word of length n, thus providing a tight lower bound on the size of the Levenshtein k-neighbourhood. We remark that this result was posed as a conjecture by Dufresne at WCTA 2019.
BibTeX - Entry
@InProceedings{charalampopoulos_et_al:LIPIcs:2020:12135,
author = {Panagiotis Charalampopoulos and Solon P. Pissis and Jakub Radoszewski and Tomasz Waleń and Wiktor Zuba},
title = {{Unary Words Have the Smallest Levenshtein k-Neighbourhoods}},
booktitle = {31st Annual Symposium on Combinatorial Pattern Matching (CPM 2020)},
pages = {10:1--10:12},
series = {Leibniz International Proceedings in Informatics (LIPIcs)},
ISBN = {978-3-95977-149-8},
ISSN = {1868-8969},
year = {2020},
volume = {161},
editor = {Inge Li G{\o}rtz and Oren Weimann},
publisher = {Schloss Dagstuhl--Leibniz-Zentrum f{\"u}r Informatik},
address = {Dagstuhl, Germany},
URL = {https://drops.dagstuhl.de/opus/volltexte/2020/12135},
URN = {urn:nbn:de:0030-drops-121359},
doi = {10.4230/LIPIcs.CPM.2020.10},
annote = {Keywords: combinatorics on words, Levenshtein distance, edit distance}
}
Keywords: |
|
combinatorics on words, Levenshtein distance, edit distance |
Collection: |
|
31st Annual Symposium on Combinatorial Pattern Matching (CPM 2020) |
Issue Date: |
|
2020 |
Date of publication: |
|
09.06.2020 |