Search CORE

47,982 research outputs found

A bi-criteria approximation algorithm for $k$ Means

Author: Makarychev Konstantin
Makarychev Yury
Sviridenko Maxim
Ward Justin
Publication venue
Publication date: 03/08/2015
Field of study

We consider the classical

k

-means clustering problem in the setting bi-criteria approximation, in which an algoithm is allowed to output

\beta k > k

clusters, and must produce a clustering with cost at most

\alpha

times the to the cost of the optimal set of

k

clusters. We argue that this approach is natural in many settings, for which the exact number of clusters is a priori unknown, or unimportant up to a constant factor. We give new bi-criteria approximation algorithms, based on linear programming and local search, respectively, which attain a guarantee

\alpha(\beta)

depending on the number

\beta k

of clusters that may be opened. Our gurantee

\alpha(\beta)

is always at most

9 + \epsilon

and improves rapidly with

\beta

(for example:

\alpha(2)<2.59

, and

\alpha(3) < 1.4

). Moreover, our algorithms have only polynomial dependence on the dimension of the input data, and so are applicable in high-dimensional settings

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

Robust Correlation Clustering

Author: Devvrit
Krishnaswamy Ravishankar
Rajaraman Nived
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2019)
Publication date: 01/01/2019
Field of study

In this paper, we introduce and study the Robust-Correlation-Clustering problem: given a graph G = (V,E) where every edge is either labeled + or - (denoting similar or dissimilar pairs of vertices), and a parameter m, the goal is to delete a set D of m vertices, and partition the remaining vertices V D into clusters to minimize the cost of the clustering, which is the sum of the number of + edges with end-points in different clusters and the number of - edges with end-points in the same cluster. This generalizes the classical Correlation-Clustering problem which is the special case when m = 0. Correlation clustering is useful when we have (only) qualitative information about the similarity or dissimilarity of pairs of points, and Robust-Correlation-Clustering equips this model with the capability to handle noise in datasets. In this work, we present a constant-factor bi-criteria algorithm for Robust-Correlation-Clustering on complete graphs (where our solution is O(1)-approximate w.r.t the cost while however discarding O(1) m points as outliers), and also complement this by showing that no finite approximation is possible if we do not violate the outlier budget. Our algorithm is very simple in that it first does a simple LP-based pre-processing to delete O(m) vertices, and subsequently runs a particular Correlation-Clustering algorithm ACNAlg [Ailon et al., 2005] on the residual instance. We then consider general graphs, and show (O(log n), O(log^2 n)) bi-criteria algorithms while also showing a hardness of alpha_MC on both the cost and the outlier violation, where alpha_MC is the lower bound for the Minimum-Multicut problem

Dagstuhl Research Online Publication Server