2,364 research outputs found
Constant Approximation for -Median and -Means with Outliers via Iterative Rounding
In this paper, we present a new iterative rounding framework for many
clustering problems. Using this, we obtain an -approximation algorithm for -median with outliers, greatly
improving upon the large implicit constant approximation ratio of Chen [Chen,
SODA 2018]. For -means with outliers, we give an -approximation, which is the first -approximation for
this problem. The iterative algorithm framework is very versatile; we show how
it can be used to give - and -approximation
algorithms for matroid and knapsack median problems respectively, improving
upon the previous best approximations ratios of [Swamy, ACM Trans.
Algorithms] and [Byrka et al, ESA 2015].
The natural LP relaxation for the -median/-means with outliers problem
has an unbounded integrality gap. In spite of this negative result, our
iterative rounding framework shows that we can round an LP solution to an
almost-integral solution of small cost, in which we have at most two
fractionally open facilities. Thus, the LP integrality gap arises due to the
gap between almost-integral and fully-integral solutions. Then, using a
pre-processing procedure, we show how to convert an almost-integral solution to
a fully-integral solution losing only a constant-factor in the approximation
ratio. By further using a sparsification technique, the additive factor loss
incurred by the conversion can be reduced to any
A Constant Approximation for Colorful k-Center
In this paper, we consider the colorful k-center problem, which is a generalization of the well-known k-center problem. Here, we are given red and blue points in a metric space, and a coverage requirement for each color. The goal is to find the smallest radius rho, such that with k balls of radius rho, the desired number of points of each color can be covered. We obtain a constant approximation for this problem in the Euclidean plane. We obtain this result by combining a "pseudo-approximation" algorithm that works in any metric space, and an approximation algorithm that works for a special class of instances in the plane. The latter algorithm uses a novel connection to a certain matching problem in graphs
Structural Iterative Rounding for Generalized k-Median Problems
This paper considers approximation algorithms for generalized k-median problems. This class of problems can be informally described as k-median with a constant number of extra constraints, and includes k-median with outliers, and knapsack median. Our first contribution is a pseudo-approximation algorithm for generalized k-median that outputs a 6.387-approximate solution with a constant number of fractional variables. The algorithm is based on iteratively rounding linear programs, and the main technical innovation comes from understanding the rich structure of the resulting extreme points.
Using our pseudo-approximation algorithm, we give improved approximation algorithms for k-median with outliers and knapsack median. This involves combining our pseudo-approximation with pre- and post-processing steps to round a constant number of fractional variables at a small increase in cost. Our algorithms achieve approximation ratios 6.994 + ? and 6.387 + ? for k-median with outliers and knapsack median, respectively. These both improve on the best known approximations
Approximation algorithms for stochastic clustering
We consider stochastic settings for clustering, and develop provably-good
approximation algorithms for a number of these notions. These algorithms yield
better approximation ratios compared to the usual deterministic clustering
setting. Additionally, they offer a number of advantages including clustering
which is fairer and has better long-term behavior for each user. In particular,
they ensure that *every user* is guaranteed to get good service (on average).
We also complement some of these with impossibility results
Robust Correlation Clustering
In this paper, we introduce and study the Robust-Correlation-Clustering problem: given a graph G = (V,E) where every edge is either labeled + or - (denoting similar or dissimilar pairs of vertices), and a parameter m, the goal is to delete a set D of m vertices, and partition the remaining vertices V D into clusters to minimize the cost of the clustering, which is the sum of the number of + edges with end-points in different clusters and the number of - edges with end-points in the same cluster. This generalizes the classical Correlation-Clustering problem which is the special case when m = 0. Correlation clustering is useful when we have (only) qualitative information about the similarity or dissimilarity of pairs of points, and Robust-Correlation-Clustering equips this model with the capability to handle noise in datasets.
In this work, we present a constant-factor bi-criteria algorithm for Robust-Correlation-Clustering on complete graphs (where our solution is O(1)-approximate w.r.t the cost while however discarding O(1) m points as outliers), and also complement this by showing that no finite approximation is possible if we do not violate the outlier budget. Our algorithm is very simple in that it first does a simple LP-based pre-processing to delete O(m) vertices, and subsequently runs a particular Correlation-Clustering algorithm ACNAlg [Ailon et al., 2005] on the residual instance. We then consider general graphs, and show (O(log n), O(log^2 n)) bi-criteria algorithms while also showing a hardness of alpha_MC on both the cost and the outlier violation, where alpha_MC is the lower bound for the Minimum-Multicut problem
On the Cost of Essentially Fair Clusterings
Clustering is a fundamental tool in data mining. It partitions points into
groups (clusters) and may be used to make decisions for each point based on its
group. However, this process may harm protected (minority) classes if the
clustering algorithm does not adequately represent them in desirable clusters
-- especially if the data is already biased.
At NIPS 2017, Chierichetti et al. proposed a model for fair clustering
requiring the representation in each cluster to (approximately) preserve the
global fraction of each protected class. Restricting to two protected classes,
they developed both a 4-approximation for the fair -center problem and a
-approximation for the fair -median problem, where is a parameter
for the fairness model. For multiple protected classes, the best known result
is a 14-approximation for fair -center.
We extend and improve the known results. Firstly, we give a 5-approximation
for the fair -center problem with multiple protected classes. Secondly, we
propose a relaxed fairness notion under which we can give bicriteria
constant-factor approximations for all of the classical clustering objectives
-center, -supplier, -median, -means and facility location. The
latter approximations are achieved by a framework that takes an arbitrary
existing unfair (integral) solution and a fair (fractional) LP solution and
combines them into an essentially fair clustering with a weakly supervised
rounding scheme. In this way, a fair clustering can be established belatedly,
in a situation where the centers are already fixed
Ordered k-Median with Outliers
We study a natural generalization of the celebrated ordered k-median problem, named robust ordered k-median, also known as ordered k-median with outliers. We are given facilities ? and clients ? in a metric space (???,d), parameters k,m ? ?_+ and a non-increasing non-negative vector w ? ?_+^m. We seek to open k facilities F ? ? and serve m clients C ? ?, inducing a service cost vector c = {d(j,F):j ? C}; the goal is to minimize the ordered objective w^?c^?, where d(j,F) = min_{i ? F}d(j,i) is the minimum distance between client j and facilities in F, and c^? ? ?_+^m is the non-increasingly sorted version of c. Robust ordered k-median captures many interesting clustering problems recently studied in the literature, e.g., robust k-median, ordered k-median, etc.
We obtain the first polynomial-time constant-factor approximation algorithm for robust ordered k-median, achieving an approximation guarantee of 127. The main difficulty comes from the presence of outliers, which already causes an unbounded integrality gap in the natural LP relaxation for robust k-median. This appears to invalidate previous methods in approximating the highly non-linear ordered objective. To overcome this issue, we introduce a novel yet very simple reduction framework that enables linear analysis of the non-linear objective. We also devise the first constant-factor approximations for ordered matroid median and ordered knapsack median using the same framework, and the approximation factors are 19.8 and 41.6, respectively
- …