License: Creative Commons Attribution 4.0 International license (CC BY 4.0)
When quoting this document, please refer to the following
DOI: 10.4230/LIPIcs.ITCS.2023.31
URN: urn:nbn:de:0030-drops-175340
URL: http://dagstuhl.sunsite.rwth-aachen.de/volltexte/2023/17534/
Chakraborty, Diptarka ;
Das, Debarati ;
Krauthgamer, Robert
Clustering Permutations: New Techniques with Streaming Applications
Abstract
We study the classical metric k-median clustering problem over a set of input rankings (i.e., permutations), which has myriad applications, from social-choice theory to web search and databases. A folklore algorithm provides a 2-approximate solution in polynomial time for all k = O(1), and works irrespective of the underlying distance measure, so long it is a metric; however, going below the 2-factor is a notorious challenge. We consider the Ulam distance, a variant of the well-known edit-distance metric, where strings are restricted to be permutations. For this metric, Chakraborty, Das, and Krauthgamer [SODA, 2021] provided a (2-δ)-approximation algorithm for k = 1, where δ≈ 2^{-40}.
Our primary contribution is a new algorithmic framework for clustering a set of permutations. Our first result is a 1.999-approximation algorithm for the metric k-median problem under the Ulam metric, that runs in time (k log (nd))^{O(k)} nd³ for an input consisting of n permutations over [d]. In fact, our framework is powerful enough to extend this result to the streaming model (where the n input permutations arrive one by one) using only polylogarithmic (in n) space. Additionally, we show that similar results can be obtained even in the presence of outliers, which is presumably a more difficult problem.
BibTeX - Entry
@InProceedings{chakraborty_et_al:LIPIcs.ITCS.2023.31,
author = {Chakraborty, Diptarka and Das, Debarati and Krauthgamer, Robert},
title = {{Clustering Permutations: New Techniques with Streaming Applications}},
booktitle = {14th Innovations in Theoretical Computer Science Conference (ITCS 2023)},
pages = {31:1--31:24},
series = {Leibniz International Proceedings in Informatics (LIPIcs)},
ISBN = {978-3-95977-263-1},
ISSN = {1868-8969},
year = {2023},
volume = {251},
editor = {Tauman Kalai, Yael},
publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
address = {Dagstuhl, Germany},
URL = {https://drops.dagstuhl.de/opus/volltexte/2023/17534},
URN = {urn:nbn:de:0030-drops-175340},
doi = {10.4230/LIPIcs.ITCS.2023.31},
annote = {Keywords: Clustering, Approximation Algorithms, Ulam Distance, Rank Aggregation, Streaming}
}
Keywords: |
|
Clustering, Approximation Algorithms, Ulam Distance, Rank Aggregation, Streaming |
Collection: |
|
14th Innovations in Theoretical Computer Science Conference (ITCS 2023) |
Issue Date: |
|
2023 |
Date of publication: |
|
01.02.2023 |