License: Creative Commons Attribution 4.0 International license (CC BY 4.0)
When quoting this document, please refer to the following
DOI: 10.4230/LIPIcs.ESA.2022.67
URN: urn:nbn:de:0030-drops-170055
URL: http://dagstuhl.sunsite.rwth-aachen.de/volltexte/2022/17005/
Go to the corresponding LIPIcs Volume Portal


van der Horst, Thijs ; Löffler, Maarten ; Staals, Frank

Chromatic k-Nearest Neighbor Queries

pdf-format:
LIPIcs-ESA-2022-67.pdf (1 MB)


Abstract

Let P be a set of n colored points. We develop efficient data structures that store P and can answer chromatic k-nearest neighbor (k-NN) queries. Such a query consists of a query point q and a number k, and asks for the color that appears most frequently among the k points in P closest to q. Answering such queries efficiently is the key to obtain fast k-NN classifiers. Our main aim is to obtain query times that are independent of k while using near-linear space.
We show that this is possible using a combination of two data structures. The first data structure allow us to compute a region containing exactly the k-nearest neighbors of a query point q, and the second data structure can then report the most frequent color in such a region. This leads to linear space data structures with query times of O(n^{1/2} log n) for points in ℝ¹, and with query times varying between O(n^{2/3}log^{2/3} n) and O(n^{5/6} polylog n), depending on the distance measure used, for points in ℝ². These results can be extended to work in higher dimensions as well. Since the query times are still fairly large we also consider approximations. If we are allowed to report a color that appears at least (1-ε)f^* times, where f^* is the frequency of the most frequent color, we obtain a query time of O(log n + log log_{1/(1-ε)} n) in ℝ¹ and expected query times ranging between Õ(n^{1/2}ε^{-3/2}) and Õ(n^{1/2}ε^{-5/2}) in ℝ² using near-linear space (ignoring polylogarithmic factors).

BibTeX - Entry

@InProceedings{vanderhorst_et_al:LIPIcs.ESA.2022.67,
  author =	{van der Horst, Thijs and L\"{o}ffler, Maarten and Staals, Frank},
  title =	{{Chromatic k-Nearest Neighbor Queries}},
  booktitle =	{30th Annual European Symposium on Algorithms (ESA 2022)},
  pages =	{67:1--67:14},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-247-1},
  ISSN =	{1868-8969},
  year =	{2022},
  volume =	{244},
  editor =	{Chechik, Shiri and Navarro, Gonzalo and Rotenberg, Eva and Herman, Grzegorz},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/opus/volltexte/2022/17005},
  URN =		{urn:nbn:de:0030-drops-170055},
  doi =		{10.4230/LIPIcs.ESA.2022.67},
  annote =	{Keywords: data structure, nearest neighbor, classification}
}

Keywords: data structure, nearest neighbor, classification
Collection: 30th Annual European Symposium on Algorithms (ESA 2022)
Issue Date: 2022
Date of publication: 01.09.2022


DROPS-Home | Fulltext Search | Imprint | Privacy Published by LZI