License: Creative Commons Attribution 4.0 International license (CC BY 4.0)
When quoting this document, please refer to the following
DOI: 10.4230/LIPIcs.ICDT.2021.21
URN: urn:nbn:de:0030-drops-137299
URL: http://dagstuhl.sunsite.rwth-aachen.de/volltexte/2021/13729/
McCauley, Samuel
Approximate Similarity Search Under Edit Distance Using Locality-Sensitive Hashing
Abstract
Edit distance similarity search, also called approximate pattern matching, is a fundamental problem with widespread database applications. The goal of the problem is to preprocess n strings of length d, to quickly answer queries q of the form: if there is a database string within edit distance r of q, return a database string within edit distance cr of q.
Previous approaches to this problem either rely on very large (superconstant) approximation ratios c, or very small search radii r. Outside of a narrow parameter range, these solutions are not competitive with trivially searching through all n strings.
In this work we give a simple and easy-to-implement hash function that can quickly answer queries for a wide range of parameters. Specifically, our strategy can answer queries in time Õ(d3^rn^{1/c}). The best known practical results require c ≫ r to achieve any correctness guarantee; meanwhile, the best known theoretical results are very involved and difficult to implement, and require query time that can be loosely bounded below by 24^r. Our results significantly broaden the range of parameters for which there exist nontrivial theoretical bounds, while retaining the practicality of a locality-sensitive hash function.
BibTeX - Entry
@InProceedings{mccauley:LIPIcs.ICDT.2021.21,
author = {McCauley, Samuel},
title = {{Approximate Similarity Search Under Edit Distance Using Locality-Sensitive Hashing}},
booktitle = {24th International Conference on Database Theory (ICDT 2021)},
pages = {21:1--21:22},
series = {Leibniz International Proceedings in Informatics (LIPIcs)},
ISBN = {978-3-95977-179-5},
ISSN = {1868-8969},
year = {2021},
volume = {186},
editor = {Yi, Ke and Wei, Zhewei},
publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
address = {Dagstuhl, Germany},
URL = {https://drops.dagstuhl.de/opus/volltexte/2021/13729},
URN = {urn:nbn:de:0030-drops-137299},
doi = {10.4230/LIPIcs.ICDT.2021.21},
annote = {Keywords: edit distance, approximate pattern matching, approximate nearest neighbor, similarity search, locality-sensitive hashing}
}
Keywords: |
|
edit distance, approximate pattern matching, approximate nearest neighbor, similarity search, locality-sensitive hashing |
Collection: |
|
24th International Conference on Database Theory (ICDT 2021) |
Issue Date: |
|
2021 |
Date of publication: |
|
11.03.2021 |