License: Creative Commons Attribution 4.0 International license (CC BY 4.0)
When quoting this document, please refer to the following
DOI: 10.4230/LIPIcs.ESA.2022.84
URN: urn:nbn:de:0030-drops-170225
URL: http://dagstuhl.sunsite.rwth-aachen.de/volltexte/2022/17022/
Schwiegelshohn, Chris ;
Sheikh-Omar, Omar Ali
An Empirical Evaluation of k-Means Coresets
Abstract
Coresets are among the most popular paradigms for summarizing data. In particular, there exist many high performance coresets for clustering problems such as k-means in both theory and practice. Curiously, there exists no work on comparing the quality of available k-means coresets.
In this paper we perform such an evaluation. There currently is no algorithm known to measure the distortion of a candidate coreset. We provide some evidence as to why this might be computationally difficult. To complement this, we propose a benchmark for which we argue that computing coresets is challenging and which also allows us an easy (heuristic) evaluation of coresets. Using this benchmark and real-world data sets, we conduct an exhaustive evaluation of the most commonly used coreset algorithms from theory and practice.
BibTeX - Entry
@InProceedings{schwiegelshohn_et_al:LIPIcs.ESA.2022.84,
author = {Schwiegelshohn, Chris and Sheikh-Omar, Omar Ali},
title = {{An Empirical Evaluation of k-Means Coresets}},
booktitle = {30th Annual European Symposium on Algorithms (ESA 2022)},
pages = {84:1--84:17},
series = {Leibniz International Proceedings in Informatics (LIPIcs)},
ISBN = {978-3-95977-247-1},
ISSN = {1868-8969},
year = {2022},
volume = {244},
editor = {Chechik, Shiri and Navarro, Gonzalo and Rotenberg, Eva and Herman, Grzegorz},
publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
address = {Dagstuhl, Germany},
URL = {https://drops.dagstuhl.de/opus/volltexte/2022/17022},
URN = {urn:nbn:de:0030-drops-170225},
doi = {10.4230/LIPIcs.ESA.2022.84},
annote = {Keywords: coresets, k-means coresets, evaluation, benchmark}
}