119 research outputs found
Approximating -Median via Pseudo-Approximation
We present a novel approximation algorithm for -median that achieves an
approximation guarantee of
, improving upon the decade-old ratio of .
Our approach is based on two components, each of which, we believe, is of
independent interest.
First, we show that in order to give an -approximation algorithm for
-median, it is sufficient to give a \emph{pseudo-approximation algorithm}
that finds an -approximate solution by opening facilities.
This is a rather surprising result as there exist instances for which opening
facilities may lead to a significant smaller cost than if only
facilities were opened.
Second, we give such a pseudo-approximation algorithm with . Prior to our work, it was not even known whether opening
facilities would help improve the approximation ratio.Comment: 18 page
Fault Tolerant Clustering Revisited
In discrete k-center and k-median clustering, we are given a set of points P
in a metric space M, and the task is to output a set C \subseteq ? P, |C| = k,
such that the cost of clustering P using C is as small as possible. For
k-center, the cost is the furthest a point has to travel to its nearest center,
whereas for k-median, the cost is the sum of all point to nearest center
distances. In the fault-tolerant versions of these problems, we are given an
additional parameter 1 ?\leq \ell \leq ? k, such that when computing the cost
of clustering, points are assigned to their \ell-th nearest-neighbor in C,
instead of their nearest neighbor. We provide constant factor approximation
algorithms for these problems that are both conceptually simple and highly
practical from an implementation stand-point
Certified Algorithms: Worst-Case Analysis and Beyond
In this paper, we introduce the notion of a certified algorithm. Certified algorithms provide worst-case and beyond-worst-case performance guarantees. First, a ?-certified algorithm is also a ?-approximation algorithm - it finds a ?-approximation no matter what the input is. Second, it exactly solves ?-perturbation-resilient instances (?-perturbation-resilient instances model real-life instances). Additionally, certified algorithms have a number of other desirable properties: they solve both maximization and minimization versions of a problem (e.g. Max Cut and Min Uncut), solve weakly perturbation-resilient instances, and solve optimization problems with hard constraints.
In the paper, we define certified algorithms, describe their properties, present a framework for designing certified algorithms, provide examples of certified algorithms for Max Cut/Min Uncut, Minimum Multiway Cut, k-medians and k-means. We also present some negative results
Fair Clustering Through Fairlets
We study the question of fair clustering under the {\em disparate impact}
doctrine, where each protected class must have approximately equal
representation in every cluster. We formulate the fair clustering problem under
both the -center and the -median objectives, and show that even with two
protected classes the problem is challenging, as the optimum solution can
violate common conventions---for instance a point may no longer be assigned to
its nearest cluster center! En route we introduce the concept of fairlets,
which are minimal sets that satisfy fair representation while approximately
preserving the clustering objective. We show that any fair clustering problem
can be decomposed into first finding good fairlets, and then using existing
machinery for traditional clustering algorithms. While finding good fairlets
can be NP-hard, we proceed to obtain efficient approximation algorithms based
on minimum cost flow. We empirically quantify the value of fair clustering on
real-world datasets with sensitive attributes
Constant-Factor FPT Approximation for Capacitated k-Median
Capacitated k-median is one of the few outstanding optimization problems for which the existence of a polynomial time constant factor approximation algorithm remains an open problem. In a series of recent papers algorithms producing solutions violating either the number of facilities or the capacity by a multiplicative factor were obtained. However, to produce solutions without violations appears to be hard and potentially requires different algorithmic techniques. Notably, if parameterized by the number of facilities k, the problem is also W[2] hard, making the existence of an exact FPT algorithm unlikely. In this work we provide an FPT-time constant factor approximation algorithm preserving both cardinality and capacity of the facilities. The algorithm runs in time 2^O(k log k) n^O(1) and achieves an approximation ratio of 7+epsilon
The Hardness of Approximation of Euclidean k-means
The Euclidean -means problem is a classical problem that has been
extensively studied in the theoretical computer science, machine learning and
the computational geometry communities. In this problem, we are given a set of
points in Euclidean space , and the goal is to choose centers in
so that the sum of squared distances of each point to its nearest center
is minimized. The best approximation algorithms for this problem include a
polynomial time constant factor approximation for general and a
-approximation which runs in time . At
the other extreme, the only known computational complexity result for this
problem is NP-hardness [ADHP'09]. The main difficulty in obtaining hardness
results stems from the Euclidean nature of the problem, and the fact that any
point in can be a potential center. This gap in understanding left open
the intriguing possibility that the problem might admit a PTAS for all .
In this paper we provide the first hardness of approximation for the
Euclidean -means problem. Concretely, we show that there exists a constant
such that it is NP-hard to approximate the -means objective
to within a factor of . We show this via an efficient reduction
from the vertex cover problem on triangle-free graphs: given a triangle-free
graph, the goal is to choose the fewest number of vertices which are incident
on all the edges. Additionally, we give a proof that the current best hardness
results for vertex cover can be carried over to triangle-free graphs. To show
this we transform , a known hard vertex cover instance, by taking a graph
product with a suitably chosen graph , and showing that the size of the
(normalized) maximum independent set is almost exactly preserved in the product
graph using a spectral analysis, which might be of independent interest
Constant Factor Approximation for Capacitated k-Center with Outliers
The -center problem is a classic facility location problem, where given an
edge-weighted graph one is to find a subset of vertices ,
such that each vertex in is "close" to some vertex in . The
approximation status of this basic problem is well understood, as a simple
2-approximation algorithm is known to be tight. Consequently different
extensions were studied.
In the capacitated version of the problem each vertex is assigned a capacity,
which is a strict upper bound on the number of clients a facility can serve,
when located at this vertex. A constant factor approximation for the
capacitated -center was obtained last year by Cygan, Hajiaghayi and Khuller
[FOCS'12], which was recently improved to a 9-approximation by An, Bhaskara and
Svensson [arXiv'13].
In a different generalization of the problem some clients (denoted as
outliers) may be disregarded. Here we are additionally given an integer and
the goal is to serve exactly clients, which the algorithm is free to
choose. In 2001 Charikar et al. [SODA'01] presented a 3-approximation for the
-center problem with outliers.
In this paper we consider a common generalization of the two extensions
previously studied separately, i.e. we work with the capacitated -center
with outliers. We present the first constant factor approximation algorithm
with approximation ratio of 25 even for the case of non-uniform hard
capacities.Comment: 15 pages, 3 figures, accepted to STACS 201
Tight Analysis of a Multiple-Swap Heuristic for Budgeted Red-Blue Median
Budgeted Red-Blue Median is a generalization of classic -Median in that
there are two sets of facilities, say and , that can
be used to serve clients located in some metric space. The goal is to open
facilities in and facilities in for
some given bounds and connect each client to their nearest open
facility in a way that minimizes the total connection cost.
We extend work by Hajiaghayi, Khandekar, and Kortsarz [2012] and show that a
multiple-swap local search heuristic can be used to obtain a
-approximation for Budgeted Red-Blue Median for any constant
. This is an improvement over their single swap analysis and
beats the previous best approximation guarantee of 8 by Swamy [2014].
We also present a matching lower bound showing that for every ,
there are instances of Budgeted Red-Blue Median with local optimum solutions
for the -swap heuristic whose cost is
times the optimum solution cost. Thus, our analysis is tight up to the lower
order terms. In particular, for any we show the single-swap
heuristic admits local optima whose cost can be as bad as times
the optimum solution cost
- …