Search CORE

248 research outputs found

On Variants of k-means Clustering

Author: Bandyapadhyay Sayan
Varadarajan Kasturi
Publication venue
Publication date: 09/12/2015
Field of study

\textit{Clustering problems} often arise in the fields like data mining, machine learning etc. to group a collection of objects into similar groups with respect to a similarity (or dissimilarity) measure. Among the clustering problems, specifically \textit{

k

-means} clustering has got much attention from the researchers. Despite the fact that

k

-means is a very well studied problem its status in the plane is still an open problem. In particular, it is unknown whether it admits a PTAS in the plane. The best known approximation bound in polynomial time is 9+\eps. In this paper, we consider the following variant of

k

-means. Given a set

C

of points in

\mathcal{R}^d

and a real

f > 0

, find a finite set

F

of points in

\mathcal{R}^d

that minimizes the quantity

f*|F|+\sum_{p\in C} \min_{q \in F} {||p-q||}^2

. For any fixed dimension

d

, we design a local search PTAS for this problem. We also give a "bi-criterion" local search algorithm for

k

-means which uses (1+\eps)k centers and yields a solution whose cost is at most (1+\eps) times the cost of an optimal

k

-means solution. The algorithm runs in polynomial time for any fixed dimension. The contribution of this paper is two fold. On the one hand, we are being able to handle the square of distances in an elegant manner, which yields near optimal approximation bound. This leads us towards a better understanding of the

k

-means problem. On the other hand, our analysis of local search might also be useful for other geometric problems. This is important considering that very little is known about the local search method for geometric approximation.Comment: 15 page

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

Center-based Clustering under Perturbation Stability

Author: Awasthi Pranjal
Blum Avrim
Sheffet Or
Publication venue
Publication date: 11/08/2011
Field of study

Clustering under most popular objective functions is NP-hard, even to approximate well, and so unlikely to be efficiently solvable in the worst case. Recently, Bilu and Linial \cite{Bilu09} suggested an approach aimed at bypassing this computational barrier by using properties of instances one might hope to hold in practice. In particular, they argue that instances in practice should be stable to small perturbations in the metric space and give an efficient algorithm for clustering instances of the Max-Cut problem that are stable to perturbations of size

O(n^{1/2})

. In addition, they conjecture that instances stable to as little as O(1) perturbations should be solvable in polynomial time. In this paper we prove that this conjecture is true for any center-based clustering objective (such as

k

-median,

k

-means, and

k

-center). Specifically, we show we can efficiently find the optimal clustering assuming only stability to factor-3 perturbations of the underlying metric in spaces without Steiner points, and stability to factor

2+\sqrt{3}

perturbations for general metrics. In particular, we show for such instances that the popular Single-Linkage algorithm combined with dynamic programming will find the optimal clustering. We also present NP-hardness results under a weaker but related condition

arXiv.org e-Print Archive

On Sampling Based Algorithms for k-Means

Author: Bhattacharya Anup
Goyal Dishant
Jaiswal Ragesh
Kumar Amit
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 40th IARCS Annual Conference on Foundations of Software Technology and Theoretical Computer Science (FSTTCS 2020)
Publication date: 01/01/2020
Field of study

Dagstuhl Research Online Publication Server

Coresets-Methods and History: A Theoreticians Design Pattern for Approximation and Streaming Algorithms

Author: Munteanu Alexander
Schwiegelshohn Chris
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2018
Field of study

We present a technical survey on the state of the art approaches in data reduction and the coreset framework. These include geometric decompositions, gradient methods, random sampling, sketching and random projections. We further outline their importance for the design of streaming algorithms and give a brief overview on lower bounding techniques

Archivio della ricerca- Università di Roma La Sapienza

Exact Algorithms and Lower Bounds for Stable Instances of Euclidean k-Means

Author: Friggstad Zachary
Khodamoradi Kamyar
Salavatipour Mohammad R.
Publication venue
Publication date: 14/07/2018
Field of study

We investigate the complexity of solving stable or perturbation-resilient instances of k-Means and k-Median clustering in fixed dimension Euclidean metrics (or more generally doubling metrics). The notion of stable or perturbation resilient instances was introduced by Bilu and Linial [2010] and Awasthi et al. [2012]. In our context we say a k-Means instance is \alpha-stable if there is a unique OPT solution which remains unchanged if distances are (non-uniformly) stretched by a factor of at most \alpha. Stable clustering instances have been studied to explain why heuristics such as Lloyd's algorithm perform well in practice. In this work we show that for any fixed \epsilon>0, (1+\epsilon)-stable instances of k-Means in doubling metrics can be solved in polynomial time. More precisely we show a natural multiswap local search algorithm in fact finds the OPT solution for (1+\epsilon)-stable instances of k-Means and k-Median in a polynomial number of iterations. We complement this result by showing that under a plausible PCP hypothesis this is essentially tight: that when the dimension d is part of the input, there is a fixed \epsilon_0>0 s.t. there is not even a PTAS for (1+\epsilon_0)-stable k-Means in R^d unless NP=RP. To do this, we consider a robust property of CSPs; call an instance stable if there is a unique optimum solution x^* and for any other solution x', the number of unsatisfied clauses is proportional to the Hamming distance between x^* and x'. Dinur et al. have already shown stable QSAT is hard to approximate for some constant Q, our hypothesis is simply that stable QSAT with bounded variable occurrence is also hard. Given this hypothesis, we consider "stability-preserving" reductions to prove our hardness for stable k-Means. Such reductions seem to be more fragile than standard L-reductions and may be of further use to demonstrate other stable optimization problems are hard.Comment: 29 page

arXiv.org e-Print Archive