License: Creative Commons Attribution 3.0 Unported license (CC BY 3.0)
When quoting this document, please refer to the following
DOI: 10.4230/LIPIcs.ESA.2019.74
URN: urn:nbn:de:0030-drops-111951
URL: http://dagstuhl.sunsite.rwth-aachen.de/volltexte/2019/11195/
Go to the corresponding LIPIcs Volume Portal


Pagh, Rasmus ; Stausholm, Nina Mesing ; Thorup, Mikkel

Hardness of Bichromatic Closest Pair with Jaccard Similarity

pdf-format:
LIPIcs-ESA-2019-74.pdf (0.5 MB)


Abstract

Consider collections A and B of red and blue sets, respectively. Bichromatic Closest Pair is the problem of finding a pair from A x B that has similarity higher than a given threshold according to some similarity measure. Our focus here is the classic Jaccard similarity |a cap b|/|a cup b| for (a,b) in A x B.
We consider the approximate version of the problem where we are given thresholds j_1 > j_2 and wish to return a pair from A x B that has Jaccard similarity higher than j_2 if there exists a pair in A x B with Jaccard similarity at least j_1. The classic locality sensitive hashing (LSH) algorithm of Indyk and Motwani (STOC '98), instantiated with the MinHash LSH function of Broder et al., solves this problem in Õ(n^(2-delta)) time if j_1 >= j_2^(1-delta). In particular, for delta=Omega(1), the approximation ratio j_1/j_2 = 1/j_2^delta increases polynomially in 1/j_2.
In this paper we give a corresponding hardness result. Assuming the Orthogonal Vectors Conjecture (OVC), we show that there cannot be a general solution that solves the Bichromatic Closest Pair problem in O(n^(2-Omega(1))) time for j_1/j_2 = 1/j_2^o(1). Specifically, assuming OVC, we prove that for any delta>0 there exists an epsilon>0 such that Bichromatic Closest Pair with Jaccard similarity requires time Omega(n^(2-delta)) for any choice of thresholds j_2 < j_1 < 1-delta, that satisfy j_1 <= j_2^(1-epsilon).

BibTeX - Entry

@InProceedings{pagh_et_al:LIPIcs:2019:11195,
  author =	{Rasmus Pagh and Nina Mesing Stausholm and Mikkel Thorup},
  title =	{{Hardness of Bichromatic Closest Pair with Jaccard Similarity}},
  booktitle =	{27th Annual European Symposium on Algorithms (ESA 2019)},
  pages =	{74:1--74:13},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-124-5},
  ISSN =	{1868-8969},
  year =	{2019},
  volume =	{144},
  editor =	{Michael A. Bender and Ola Svensson and Grzegorz Herman},
  publisher =	{Schloss Dagstuhl--Leibniz-Zentrum fuer Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{http://drops.dagstuhl.de/opus/volltexte/2019/11195},
  URN =		{urn:nbn:de:0030-drops-111951},
  doi =		{10.4230/LIPIcs.ESA.2019.74},
  annote =	{Keywords: fine-grained complexity, set similarity search, bichromatic closest pair, jaccard similarity}
}

Keywords: fine-grained complexity, set similarity search, bichromatic closest pair, jaccard similarity
Collection: 27th Annual European Symposium on Algorithms (ESA 2019)
Issue Date: 2019
Date of publication: 06.09.2019


DROPS-Home | Fulltext Search | Imprint | Privacy Published by LZI