Abstract
In this paper, we introduce and study the RobustCorrelationClustering problem: given a graph G = (V,E) where every edge is either labeled + or  (denoting similar or dissimilar pairs of vertices), and a parameter m, the goal is to delete a set D of m vertices, and partition the remaining vertices V \ D into clusters to minimize the cost of the clustering, which is the sum of the number of + edges with endpoints in different clusters and the number of  edges with endpoints in the same cluster. This generalizes the classical CorrelationClustering problem which is the special case when m = 0. Correlation clustering is useful when we have (only) qualitative information about the similarity or dissimilarity of pairs of points, and RobustCorrelationClustering equips this model with the capability to handle noise in datasets.
In this work, we present a constantfactor bicriteria algorithm for RobustCorrelationClustering on complete graphs (where our solution is O(1)approximate w.r.t the cost while however discarding O(1) m points as outliers), and also complement this by showing that no finite approximation is possible if we do not violate the outlier budget. Our algorithm is very simple in that it first does a simple LPbased preprocessing to delete O(m) vertices, and subsequently runs a particular CorrelationClustering algorithm ACNAlg [Ailon et al., 2005] on the residual instance. We then consider general graphs, and show (O(log n), O(log^2 n)) bicriteria algorithms while also showing a hardness of alpha_MC on both the cost and the outlier violation, where alpha_MC is the lower bound for the MinimumMulticut problem.
BibTeX  Entry
@InProceedings{devvrit_et_al:LIPIcs:2019:11248,
author = { Devvrit and Ravishankar Krishnaswamy and Nived Rajaraman},
title = {{Robust Correlation Clustering}},
booktitle = {Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2019)},
pages = {33:133:18},
series = {Leibniz International Proceedings in Informatics (LIPIcs)},
ISBN = {9783959771252},
ISSN = {18688969},
year = {2019},
volume = {145},
editor = {Dimitris Achlioptas and L{\'a}szl{\'o} A. V{\'e}gh},
publisher = {Schloss DagstuhlLeibnizZentrum fuer Informatik},
address = {Dagstuhl, Germany},
URL = {http://drops.dagstuhl.de/opus/volltexte/2019/11248},
URN = {urn:nbn:de:0030drops112483},
doi = {10.4230/LIPIcs.APPROXRANDOM.2019.33},
annote = {Keywords: Correlation Clustering, Outlier Detection, Clustering, Approximation Algorithms}
}
Keywords: 

Correlation Clustering, Outlier Detection, Clustering, Approximation Algorithms 
Collection: 

Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2019) 
Issue Date: 

2019 
Date of publication: 

17.09.2019 