License: Creative Commons Attribution 3.0 Unported license (CC BY 3.0)
When quoting this document, please refer to the following
DOI: 10.4230/LIPIcs.SoCG.2016.14
URN: urn:nbn:de:0030-drops-59061
URL: http://dagstuhl.sunsite.rwth-aachen.de/volltexte/2016/5906/
Go to the corresponding LIPIcs Volume Portal


Bandyapadhyay, Sayan ; Varadarajan, Kasturi

On Variants of k-means Clustering

pdf-format:
LIPIcs-SoCG-2016-14.pdf (0.5 MB)


Abstract

Clustering problems often arise in fields like data mining and machine learning. Clustering usually refers to the task of partitioning a collection of objects into groups with similar elements, with respect to a similarity (or dissimilarity) measure. Among the clustering problems, k-means clustering in particular has received much attention from researchers. Despite the fact that k-means is a well studied problem, its status in the plane is still open. In particular, it is unknown whether it admits a PTAS in the plane. The best known approximation bound achievable in polynomial time is 9+epsilon.

In this paper, we consider the following variant of k-means. Given a set C of points in R^d and a real f > 0, find a finite set F of points in R^d that minimizes the quantity f*|F|+sum_{p in C} min_{q in F} {||p-q||}^2. For any fixed dimension d, we design a PTAS for this problem that is based on local search. We also give a "bi-criterion" local search algorithm for k-means which uses (1+epsilon)k centers and yields a solution whose cost is at most (1+epsilon) times the cost of an optimal k-means solution. The algorithm runs in polynomial time for any fixed dimension.

The contribution of this paper is two-fold. On the one hand, we are able to handle the square of distances in an elegant manner, obtaining a near-optimal approximation bound. This leads us towards a better understanding of the k-means problem. On the other hand, our analysis of local search might also be useful for other geometric problems. This is important considering that little is known about the local search method for geometric approximation.

BibTeX - Entry

@InProceedings{bandyapadhyay_et_al:LIPIcs:2016:5906,
  author =	{Sayan Bandyapadhyay and Kasturi Varadarajan},
  title =	{{On Variants of k-means Clustering}},
  booktitle =	{32nd International Symposium on Computational Geometry (SoCG 2016)},
  pages =	{14:1--14:15},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-009-5},
  ISSN =	{1868-8969},
  year =	{2016},
  volume =	{51},
  editor =	{S{\'a}ndor Fekete and Anna Lubiw},
  publisher =	{Schloss Dagstuhl--Leibniz-Zentrum fuer Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{http://drops.dagstuhl.de/opus/volltexte/2016/5906},
  URN =		{urn:nbn:de:0030-drops-59061},
  doi =		{10.4230/LIPIcs.SoCG.2016.14},
  annote =	{Keywords: k-means, Facility location, Local search, Geometric approximation}
}

Keywords: k-means, Facility location, Local search, Geometric approximation
Collection: 32nd International Symposium on Computational Geometry (SoCG 2016)
Issue Date: 2016
Date of publication: 10.06.2016


DROPS-Home | Fulltext Search | Imprint | Privacy Published by LZI