License: Creative Commons Attribution 4.0 International license (CC BY 4.0)
When quoting this document, please refer to the following
DOI: 10.4230/LIPIcs.ICDT.2022.7
URN: urn:nbn:de:0030-drops-158812
URL: http://dagstuhl.sunsite.rwth-aachen.de/volltexte/2022/15881/
Addanki, Raghavendra ;
McGregor, Andrew ;
Meliou, Alexandra ;
Moumoulidou, Zafeiria
Improved Approximation and Scalability for Fair Max-Min Diversification
Abstract
Given an n-point metric space ({?},d) where each point belongs to one of m = O(1) different categories or groups and a set of integers k₁, …, k_m, the fair Max-Min diversification problem is to select k_i points belonging to category i ∈ [m], such that the minimum pairwise distance between selected points is maximized. The problem was introduced by Moumoulidou et al. [ICDT 2021] and is motivated by the need to down-sample large data sets in various applications so that the derived sample achieves a balance over diversity, i.e., the minimum distance between a pair of selected points, and fairness, i.e., ensuring enough points of each category are included. We prove the following results:
1) We first consider general metric spaces. We present a randomized polynomial time algorithm that returns a factor 2-approximation to the diversity but only satisfies the fairness constraints in expectation. Building upon this result, we present a 6-approximation that is guaranteed to satisfy the fairness constraints up to a factor 1-ε for any constant ε. We also present a linear time algorithm returning an m+1 approximation with exact fairness. The best previous result was a 3m-1 approximation.
2) We then focus on Euclidean metrics. We first show that the problem can be solved exactly in one dimension. {For constant dimensions, categories and any constant ε > 0, we present a 1+ε approximation algorithm that runs in O(nk) + 2^{O(k)} time where k = k₁+…+k_m.} We can improve the running time to O(nk)+poly(k) at the expense of only picking (1-ε) k_i points from category i ∈ [m].
Finally, we present algorithms suitable to processing massive data sets including single-pass data stream algorithms and composable coresets for the distributed processing.
BibTeX - Entry
@InProceedings{addanki_et_al:LIPIcs.ICDT.2022.7,
author = {Addanki, Raghavendra and McGregor, Andrew and Meliou, Alexandra and Moumoulidou, Zafeiria},
title = {{Improved Approximation and Scalability for Fair Max-Min Diversification}},
booktitle = {25th International Conference on Database Theory (ICDT 2022)},
pages = {7:1--7:21},
series = {Leibniz International Proceedings in Informatics (LIPIcs)},
ISBN = {978-3-95977-223-5},
ISSN = {1868-8969},
year = {2022},
volume = {220},
editor = {Olteanu, Dan and Vortmeier, Nils},
publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
address = {Dagstuhl, Germany},
URL = {https://drops.dagstuhl.de/opus/volltexte/2022/15881},
URN = {urn:nbn:de:0030-drops-158812},
doi = {10.4230/LIPIcs.ICDT.2022.7},
annote = {Keywords: algorithmic fairness, diversity maximization, data selection, approximation algorithms}
}
Keywords: |
|
algorithmic fairness, diversity maximization, data selection, approximation algorithms |
Collection: |
|
25th International Conference on Database Theory (ICDT 2022) |
Issue Date: |
|
2022 |
Date of publication: |
|
19.03.2022 |
Supplementary Material: |
|
Audiovisual (Video of the Presentation): https://doi.org/10.5446/57487 |