License: Creative Commons Attribution 4.0 International license (CC BY 4.0)
When quoting this document, please refer to the following
DOI: 10.4230/LIPIcs.APPROX/RANDOM.2021.18
URN: urn:nbn:de:0030-drops-147115
URL: http://dagstuhl.sunsite.rwth-aachen.de/volltexte/2021/14711/
Go to the corresponding LIPIcs Volume Portal


Arutyunova, Anna ; Großwendt, Anna ; Röglin, Heiko ; Schmidt, Melanie ; Wargalla, Julian

Upper and Lower Bounds for Complete Linkage in General Metric Spaces

pdf-format:
LIPIcs-APPROX18.pdf (0.7 MB)


Abstract

In a hierarchical clustering problem the task is to compute a series of mutually compatible clusterings of a finite metric space (P,dist). Starting with the clustering where every point forms its own cluster, one iteratively merges two clusters until only one cluster remains. Complete linkage is a well-known and popular algorithm to compute such clusterings: in every step it merges the two clusters whose union has the smallest radius (or diameter) among all currently possible merges. We prove that the radius (or diameter) of every k-clustering computed by complete linkage is at most by factor O(k) (or O(k²)) worse than an optimal k-clustering minimizing the radius (or diameter). Furthermore we give a negative answer to the question proposed by Dasgupta and Long [Sanjoy Dasgupta and Philip M. Long, 2005], who show a lower bound of Ω(log(k)) and ask if the approximation guarantee is in fact Θ(log(k)). We present instances where complete linkage performs poorly in the sense that the k-clustering computed by complete linkage is off by a factor of Ω(k) from an optimal solution for radius and diameter. We conclude that in general metric spaces complete linkage does not perform asymptotically better than single linkage, merging the two clusters with smallest inter-cluster distance, for which we prove an approximation guarantee of O(k).

BibTeX - Entry

@InProceedings{arutyunova_et_al:LIPIcs.APPROX/RANDOM.2021.18,
  author =	{Arutyunova, Anna and Gro{\ss}wendt, Anna and R\"{o}glin, Heiko and Schmidt, Melanie and Wargalla, Julian},
  title =	{{Upper and Lower Bounds for Complete Linkage in General Metric Spaces}},
  booktitle =	{Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2021)},
  pages =	{18:1--18:22},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-207-5},
  ISSN =	{1868-8969},
  year =	{2021},
  volume =	{207},
  editor =	{Wootters, Mary and Sanit\`{a}, Laura},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/opus/volltexte/2021/14711},
  URN =		{urn:nbn:de:0030-drops-147115},
  doi =		{10.4230/LIPIcs.APPROX/RANDOM.2021.18},
  annote =	{Keywords: Hierarchical Clustering, Complete Linkage, agglomerative Clustering, k-Center}
}

Keywords: Hierarchical Clustering, Complete Linkage, agglomerative Clustering, k-Center
Collection: Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2021)
Issue Date: 2021
Date of publication: 15.09.2021


DROPS-Home | Fulltext Search | Imprint | Privacy Published by LZI