Abstract
Clustering is a fundamental tool in data mining and machine learning. It partitions points into groups (clusters) and may be used to make decisions for each point based on its group. However, this process may harm protected (minority) classes if the clustering algorithm does not adequately represent them in desirable clusters  especially if the data is already biased.
At NIPS 2017, Chierichetti et al. [Flavio Chierichetti et al., 2017] proposed a model for fair clustering requiring the representation in each cluster to (approximately) preserve the global fraction of each protected class. Restricting to two protected classes, they developed both a 4approximation for the fair kcenter problem and a O(t)approximation for the fair kmedian problem, where t is a parameter for the fairness model. For multiple protected classes, the best known result is a 14approximation for fair kcenter [Clemens Rösner and Melanie Schmidt, 2018].
We extend and improve the known results. Firstly, we give a 5approximation for the fair kcenter problem with multiple protected classes. Secondly, we propose a relaxed fairness notion under which we can give bicriteria constantfactor approximations for all of the classical clustering objectives kcenter, ksupplier, kmedian, kmeans and facility location. The latter approximations are achieved by a framework that takes an arbitrary existing unfair (integral) solution and a fair (fractional) LP solution and combines them into an essentially fair clustering with a weakly supervised rounding scheme. In this way, a fair clustering can be established belatedly, in a situation where the centers are already fixed.
BibTeX  Entry
@InProceedings{bercea_et_al:LIPIcs:2019:11233,
author = {Ioana O. Bercea and Martin Gro{\ss} and Samir Khuller and Aounon Kumar and Clemens R{\"o}sner and Daniel R. Schmidt and Melanie Schmidt},
title = {{On the Cost of Essentially Fair Clusterings}},
booktitle = {Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2019)},
pages = {18:118:22},
series = {Leibniz International Proceedings in Informatics (LIPIcs)},
ISBN = {9783959771252},
ISSN = {18688969},
year = {2019},
volume = {145},
editor = {Dimitris Achlioptas and L{\'a}szl{\'o} A. V{\'e}gh},
publisher = {Schloss DagstuhlLeibnizZentrum fuer Informatik},
address = {Dagstuhl, Germany},
URL = {http://drops.dagstuhl.de/opus/volltexte/2019/11233},
URN = {urn:nbn:de:0030drops112337},
doi = {10.4230/LIPIcs.APPROXRANDOM.2019.18},
annote = {Keywords: approximation, clustering, fairness, LP rounding}
}
Keywords: 

approximation, clustering, fairness, LP rounding 
Collection: 

Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2019) 
Issue Date: 

2019 
Date of publication: 

17.09.2019 