License: Creative Commons Attribution 4.0 International license (CC BY 4.0)
When quoting this document, please refer to the following
DOI: 10.4230/LIPIcs.ICDT.2021.21
URN: urn:nbn:de:0030-drops-137299
URL: http://dagstuhl.sunsite.rwth-aachen.de/volltexte/2021/13729/
Go to the corresponding LIPIcs Volume Portal


McCauley, Samuel

Approximate Similarity Search Under Edit Distance Using Locality-Sensitive Hashing

pdf-format:
LIPIcs-ICDT-2021-21.pdf (0.8 MB)


Abstract

Edit distance similarity search, also called approximate pattern matching, is a fundamental problem with widespread database applications. The goal of the problem is to preprocess n strings of length d, to quickly answer queries q of the form: if there is a database string within edit distance r of q, return a database string within edit distance cr of q.
Previous approaches to this problem either rely on very large (superconstant) approximation ratios c, or very small search radii r. Outside of a narrow parameter range, these solutions are not competitive with trivially searching through all n strings.
In this work we give a simple and easy-to-implement hash function that can quickly answer queries for a wide range of parameters. Specifically, our strategy can answer queries in time Õ(d3^rn^{1/c}). The best known practical results require c ≫ r to achieve any correctness guarantee; meanwhile, the best known theoretical results are very involved and difficult to implement, and require query time that can be loosely bounded below by 24^r. Our results significantly broaden the range of parameters for which there exist nontrivial theoretical bounds, while retaining the practicality of a locality-sensitive hash function.

BibTeX - Entry

@InProceedings{mccauley:LIPIcs.ICDT.2021.21,
  author =	{McCauley, Samuel},
  title =	{{Approximate Similarity Search Under Edit Distance Using Locality-Sensitive Hashing}},
  booktitle =	{24th International Conference on Database Theory (ICDT 2021)},
  pages =	{21:1--21:22},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-179-5},
  ISSN =	{1868-8969},
  year =	{2021},
  volume =	{186},
  editor =	{Yi, Ke and Wei, Zhewei},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/opus/volltexte/2021/13729},
  URN =		{urn:nbn:de:0030-drops-137299},
  doi =		{10.4230/LIPIcs.ICDT.2021.21},
  annote =	{Keywords: edit distance, approximate pattern matching, approximate nearest neighbor, similarity search, locality-sensitive hashing}
}

Keywords: edit distance, approximate pattern matching, approximate nearest neighbor, similarity search, locality-sensitive hashing
Collection: 24th International Conference on Database Theory (ICDT 2021)
Issue Date: 2021
Date of publication: 11.03.2021


DROPS-Home | Fulltext Search | Imprint | Privacy Published by LZI