License: Creative Commons Attribution 4.0 International license (CC BY 4.0)
When quoting this document, please refer to the following
DOI: 10.4230/LIPIcs.ICALP.2021.23
URN: urn:nbn:de:0030-drops-140923
URL: http://dagstuhl.sunsite.rwth-aachen.de/volltexte/2021/14092/
Go to the corresponding LIPIcs Volume Portal


Bandyapadhyay, Sayan ; Fomin, Fedor V. ; Simonov, Kirill

On Coresets for Fair Clustering in Metric and Euclidean Spaces and Their Applications

pdf-format:
LIPIcs-ICALP-2021-23.pdf (0.8 MB)


Abstract

Fair clustering is a variant of constrained clustering where the goal is to partition a set of colored points. The fraction of points of each color in every cluster should be more or less equal to the fraction of points of this color in the dataset. This variant was recently introduced by Chierichetti et al. [NeurIPS 2017] and became widely popular. This paper proposes a new construction of coresets for fair k-means and k-median clustering for Euclidean and general metrics based on random sampling. For the Euclidean space ℝ^d, we provide the first coresets whose size does not depend exponentially on the dimension d. The question of whether such constructions exist was asked by Schmidt, Schwiegelshohn, and Sohler [WAOA 2019] and Huang, Jiang, and Vishnoi [NeurIPS 2019]. For general metric, our construction provides the first coreset for fair k-means and k-median.
New coresets appear to be a handy tool for designing better approximation and streaming algorithms for fair and other constrained clustering variants. In particular, we obtain
- the first fixed-parameter tractable (FPT) PTAS for fair k-means and k-median clustering in ℝ^d. The near-linear time of our PTAS improves over the previous scheme of Böhm, Fazzone, Leonardi, and Schwiegelshohn [ArXiv 2020] with running time n^{poly(k/ε)};
- FPT "true" constant-approximation for metric fair clustering. All previous algorithms for fair k-means and k-median in general metric are bicriteria and violate the fairness constraints;
- FPT 3-approximation for lower-bounded k-median improving the best-known 3.736 factor of Bera, Chakrabarty, and Negahbani [ArXiv 2019];
- the first FPT constant-approximations for metric chromatic clustering and ?-Diversity clustering;
- near linear-time (in n) PTAS for capacitated and lower-bounded clustering improving over PTAS of Bhattacharya, Jaiswal, and Kumar [TOCS 2018] with super-quadratic running time;
- a streaming (1+ε)-approximation for fair k-means and k-median of space complexity polynomial in k, d, ε and log{n} (the previous algorithms have exponential space complexity on either d or k).

BibTeX - Entry

@InProceedings{bandyapadhyay_et_al:LIPIcs.ICALP.2021.23,
  author =	{Bandyapadhyay, Sayan and Fomin, Fedor V. and Simonov, Kirill},
  title =	{{On Coresets for Fair Clustering in Metric and Euclidean Spaces and Their Applications}},
  booktitle =	{48th International Colloquium on Automata, Languages, and Programming (ICALP 2021)},
  pages =	{23:1--23:15},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-195-5},
  ISSN =	{1868-8969},
  year =	{2021},
  volume =	{198},
  editor =	{Bansal, Nikhil and Merelli, Emanuela and Worrell, James},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/opus/volltexte/2021/14092},
  URN =		{urn:nbn:de:0030-drops-140923},
  doi =		{10.4230/LIPIcs.ICALP.2021.23},
  annote =	{Keywords: fair clustering, coresets, approximation algorithms}
}

Keywords: fair clustering, coresets, approximation algorithms
Collection: 48th International Colloquium on Automata, Languages, and Programming (ICALP 2021)
Issue Date: 2021
Date of publication: 02.07.2021


DROPS-Home | Fulltext Search | Imprint | Privacy Published by LZI