248 research outputs found
On Variants of k-means Clustering
\textit{Clustering problems} often arise in the fields like data mining,
machine learning etc. to group a collection of objects into similar groups with
respect to a similarity (or dissimilarity) measure. Among the clustering
problems, specifically \textit{-means} clustering has got much attention
from the researchers. Despite the fact that -means is a very well studied
problem its status in the plane is still an open problem. In particular, it is
unknown whether it admits a PTAS in the plane. The best known approximation
bound in polynomial time is 9+\eps.
In this paper, we consider the following variant of -means. Given a set
of points in and a real , find a finite set of
points in that minimizes the quantity . For any fixed dimension , we design a local
search PTAS for this problem. We also give a "bi-criterion" local search
algorithm for -means which uses (1+\eps)k centers and yields a solution
whose cost is at most (1+\eps) times the cost of an optimal -means
solution. The algorithm runs in polynomial time for any fixed dimension.
The contribution of this paper is two fold. On the one hand, we are being
able to handle the square of distances in an elegant manner, which yields near
optimal approximation bound. This leads us towards a better understanding of
the -means problem. On the other hand, our analysis of local search might
also be useful for other geometric problems. This is important considering that
very little is known about the local search method for geometric approximation.Comment: 15 page
Center-based Clustering under Perturbation Stability
Clustering under most popular objective functions is NP-hard, even to
approximate well, and so unlikely to be efficiently solvable in the worst case.
Recently, Bilu and Linial \cite{Bilu09} suggested an approach aimed at
bypassing this computational barrier by using properties of instances one might
hope to hold in practice. In particular, they argue that instances in practice
should be stable to small perturbations in the metric space and give an
efficient algorithm for clustering instances of the Max-Cut problem that are
stable to perturbations of size . In addition, they conjecture that
instances stable to as little as O(1) perturbations should be solvable in
polynomial time. In this paper we prove that this conjecture is true for any
center-based clustering objective (such as -median, -means, and
-center). Specifically, we show we can efficiently find the optimal
clustering assuming only stability to factor-3 perturbations of the underlying
metric in spaces without Steiner points, and stability to factor
perturbations for general metrics. In particular, we show for such instances
that the popular Single-Linkage algorithm combined with dynamic programming
will find the optimal clustering. We also present NP-hardness results under a
weaker but related condition
Coresets-Methods and History: A Theoreticians Design Pattern for Approximation and Streaming Algorithms
We present a technical survey on the state of the art approaches in data reduction and the coreset framework. These include geometric decompositions, gradient methods, random sampling, sketching and random projections. We further outline their importance for the design of streaming algorithms and give a brief overview on lower bounding techniques
Exact Algorithms and Lower Bounds for Stable Instances of Euclidean k-Means
We investigate the complexity of solving stable or perturbation-resilient
instances of k-Means and k-Median clustering in fixed dimension Euclidean
metrics (or more generally doubling metrics). The notion of stable or
perturbation resilient instances was introduced by Bilu and Linial [2010] and
Awasthi et al. [2012]. In our context we say a k-Means instance is
\alpha-stable if there is a unique OPT solution which remains unchanged if
distances are (non-uniformly) stretched by a factor of at most \alpha. Stable
clustering instances have been studied to explain why heuristics such as
Lloyd's algorithm perform well in practice. In this work we show that for any
fixed \epsilon>0, (1+\epsilon)-stable instances of k-Means in doubling metrics
can be solved in polynomial time. More precisely we show a natural multiswap
local search algorithm in fact finds the OPT solution for (1+\epsilon)-stable
instances of k-Means and k-Median in a polynomial number of iterations. We
complement this result by showing that under a plausible PCP hypothesis this is
essentially tight: that when the dimension d is part of the input, there is a
fixed \epsilon_0>0 s.t. there is not even a PTAS for (1+\epsilon_0)-stable
k-Means in R^d unless NP=RP. To do this, we consider a robust property of CSPs;
call an instance stable if there is a unique optimum solution x^* and for any
other solution x', the number of unsatisfied clauses is proportional to the
Hamming distance between x^* and x'. Dinur et al. have already shown stable
QSAT is hard to approximate for some constant Q, our hypothesis is simply that
stable QSAT with bounded variable occurrence is also hard. Given this
hypothesis, we consider "stability-preserving" reductions to prove our hardness
for stable k-Means. Such reductions seem to be more fragile than standard
L-reductions and may be of further use to demonstrate other stable optimization
problems are hard.Comment: 29 page
- …