DROPS - Document

License:

Creative Commons Attribution 4.0 International license (CC BY 4.0)
When quoting this document, please refer to the following
DOI: 10.4230/LIPIcs.ESA.2022.84
URN: urn:nbn:de:0030-drops-170225
URL: http://dagstuhl.sunsite.rwth-aachen.de/volltexte/2022/17022/

Go to the corresponding LIPIcs Volume Portal

Schwiegelshohn, Chris ; Sheikh-Omar, Omar Ali

An Empirical Evaluation of k-Means Coresets

pdf-format:

LIPIcs-ESA-2022-84.pdf (0.9 MB)

Abstract

Coresets are among the most popular paradigms for summarizing data. In particular, there exist many high performance coresets for clustering problems such as k-means in both theory and practice. Curiously, there exists no work on comparing the quality of available k-means coresets.
In this paper we perform such an evaluation. There currently is no algorithm known to measure the distortion of a candidate coreset. We provide some evidence as to why this might be computationally difficult. To complement this, we propose a benchmark for which we argue that computing coresets is challenging and which also allows us an easy (heuristic) evaluation of coresets. Using this benchmark and real-world data sets, we conduct an exhaustive evaluation of the most commonly used coreset algorithms from theory and practice.

BibTeX - Entry

@InProceedings{schwiegelshohn_et_al:LIPIcs.ESA.2022.84,
  author =	{Schwiegelshohn, Chris and Sheikh-Omar, Omar Ali},
  title =	{{An Empirical Evaluation of k-Means Coresets}},
  booktitle =	{30th Annual European Symposium on Algorithms (ESA 2022)},
  pages =	{84:1--84:17},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-247-1},
  ISSN =	{1868-8969},
  year =	{2022},
  volume =	{244},
  editor =	{Chechik, Shiri and Navarro, Gonzalo and Rotenberg, Eva and Herman, Grzegorz},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/opus/volltexte/2022/17022},
  URN =		{urn:nbn:de:0030-drops-170225},
  doi =		{10.4230/LIPIcs.ESA.2022.84},
  annote =	{Keywords: coresets, k-means coresets, evaluation, benchmark}
}

Keywords: coresets, k-means coresets, evaluation, benchmark

Collection: 30th Annual European Symposium on Algorithms (ESA 2022)

Issue Date: 2022

Date of publication: 01.09.2022

Supplementary Material: Software (Source Code): https://github.com/sheikhomar/eval-k-means-coresets archived at: https://archive.softwareheritage.org/swh:1:dir:53066aa034ea87cdf2fd2f5cb2077400aaf341c3

DROPS-Home | Fulltext Search | Imprint | Privacy Published by LZI

Keywords:		coresets, k-means coresets, evaluation, benchmark
Collection:		30th Annual European Symposium on Algorithms (ESA 2022)
Issue Date:		2022
Date of publication:		01.09.2022
Supplementary Material:		Software (Source Code): https://github.com/sheikhomar/eval-k-means-coresets archived at: https://archive.softwareheritage.org/swh:1:dir:53066aa034ea87cdf2fd2f5cb2077400aaf341c3