License: Creative Commons Attribution 4.0 International license (CC BY 4.0)
When quoting this document, please refer to the following
DOI: 10.4230/LIPIcs.ITCS.2023.31
URN: urn:nbn:de:0030-drops-175340
Go to the corresponding LIPIcs Volume Portal

Chakraborty, Diptarka ; Das, Debarati ; Krauthgamer, Robert

Clustering Permutations: New Techniques with Streaming Applications

LIPIcs-ITCS-2023-31.pdf (0.8 MB)


We study the classical metric k-median clustering problem over a set of input rankings (i.e., permutations), which has myriad applications, from social-choice theory to web search and databases. A folklore algorithm provides a 2-approximate solution in polynomial time for all k = O(1), and works irrespective of the underlying distance measure, so long it is a metric; however, going below the 2-factor is a notorious challenge. We consider the Ulam distance, a variant of the well-known edit-distance metric, where strings are restricted to be permutations. For this metric, Chakraborty, Das, and Krauthgamer [SODA, 2021] provided a (2-δ)-approximation algorithm for k = 1, where δ≈ 2^{-40}.
Our primary contribution is a new algorithmic framework for clustering a set of permutations. Our first result is a 1.999-approximation algorithm for the metric k-median problem under the Ulam metric, that runs in time (k log (nd))^{O(k)} nd³ for an input consisting of n permutations over [d]. In fact, our framework is powerful enough to extend this result to the streaming model (where the n input permutations arrive one by one) using only polylogarithmic (in n) space. Additionally, we show that similar results can be obtained even in the presence of outliers, which is presumably a more difficult problem.

BibTeX - Entry

  author =	{Chakraborty, Diptarka and Das, Debarati and Krauthgamer, Robert},
  title =	{{Clustering Permutations: New Techniques with Streaming Applications}},
  booktitle =	{14th Innovations in Theoretical Computer Science Conference (ITCS 2023)},
  pages =	{31:1--31:24},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-263-1},
  ISSN =	{1868-8969},
  year =	{2023},
  volume =	{251},
  editor =	{Tauman Kalai, Yael},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{},
  URN =		{urn:nbn:de:0030-drops-175340},
  doi =		{10.4230/LIPIcs.ITCS.2023.31},
  annote =	{Keywords: Clustering, Approximation Algorithms, Ulam Distance, Rank Aggregation, Streaming}

Keywords: Clustering, Approximation Algorithms, Ulam Distance, Rank Aggregation, Streaming
Collection: 14th Innovations in Theoretical Computer Science Conference (ITCS 2023)
Issue Date: 2023
Date of publication: 01.02.2023

DROPS-Home | Fulltext Search | Imprint | Privacy Published by LZI