Search CORE

52 research outputs found

The Hardness of Approximation of Euclidean k-means

Author: Awasthi Pranjal
Charikar Moses
Krishnaswamy Ravishankar
Sinop Ali Kemal
Publication venue
Publication date: 01/01/2015
Field of study

The Euclidean

k

-means problem is a classical problem that has been extensively studied in the theoretical computer science, machine learning and the computational geometry communities. In this problem, we are given a set of

n

points in Euclidean space

R^d

, and the goal is to choose

k

centers in

R^d

so that the sum of squared distances of each point to its nearest center is minimized. The best approximation algorithms for this problem include a polynomial time constant factor approximation for general

k

and a

(1+\epsilon)

-approximation which runs in time

poly(n) 2^{O(k/\epsilon)}

. At the other extreme, the only known computational complexity result for this problem is NP-hardness [ADHP'09]. The main difficulty in obtaining hardness results stems from the Euclidean nature of the problem, and the fact that any point in

R^d

can be a potential center. This gap in understanding left open the intriguing possibility that the problem might admit a PTAS for all

k,d

. In this paper we provide the first hardness of approximation for the Euclidean

k

-means problem. Concretely, we show that there exists a constant

\epsilon > 0

such that it is NP-hard to approximate the

k

-means objective to within a factor of

(1+\epsilon)

. We show this via an efficient reduction from the vertex cover problem on triangle-free graphs: given a triangle-free graph, the goal is to choose the fewest number of vertices which are incident on all the edges. Additionally, we give a proof that the current best hardness results for vertex cover can be carried over to triangle-free graphs. To show this we transform

G

, a known hard vertex cover instance, by taking a graph product with a suitably chosen graph

H

, and showing that the size of the (normalized) maximum independent set is almost exactly preserved in the product graph using a spectral analysis, which might be of independent interest

arXiv.org e-Print Archive

CiteSeerX

Princeton University Open Access Repository

Dagstuhl Research Online Publication Server

On Variants of k-means Clustering

Author: Bandyapadhyay Sayan
Varadarajan Kasturi
Publication venue
Publication date: 09/12/2015
Field of study

\textit{Clustering problems} often arise in the fields like data mining, machine learning etc. to group a collection of objects into similar groups with respect to a similarity (or dissimilarity) measure. Among the clustering problems, specifically \textit{

k

-means} clustering has got much attention from the researchers. Despite the fact that

k

-means is a very well studied problem its status in the plane is still an open problem. In particular, it is unknown whether it admits a PTAS in the plane. The best known approximation bound in polynomial time is 9+\eps. In this paper, we consider the following variant of

k

-means. Given a set

C

of points in

\mathcal{R}^d

and a real

f > 0

, find a finite set

F

of points in

\mathcal{R}^d

that minimizes the quantity

f*|F|+\sum_{p\in C} \min_{q \in F} {||p-q||}^2

. For any fixed dimension

d

, we design a local search PTAS for this problem. We also give a "bi-criterion" local search algorithm for

k

-means which uses (1+\eps)k centers and yields a solution whose cost is at most (1+\eps) times the cost of an optimal

k

-means solution. The algorithm runs in polynomial time for any fixed dimension. The contribution of this paper is two fold. On the one hand, we are being able to handle the square of distances in an elegant manner, which yields near optimal approximation bound. This leads us towards a better understanding of the

k

-means problem. On the other hand, our analysis of local search might also be useful for other geometric problems. This is important considering that very little is known about the local search method for geometric approximation.Comment: 15 page

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

A bi-criteria approximation algorithm for $k$ Means

Author: Makarychev Konstantin
Makarychev Yury
Sviridenko Maxim
Ward Justin
Publication venue
Publication date: 03/08/2015
Field of study

We consider the classical

k

-means clustering problem in the setting bi-criteria approximation, in which an algoithm is allowed to output

\beta k > k

clusters, and must produce a clustering with cost at most

\alpha

times the to the cost of the optimal set of

k

clusters. We argue that this approach is natural in many settings, for which the exact number of clusters is a priori unknown, or unimportant up to a constant factor. We give new bi-criteria approximation algorithms, based on linear programming and local search, respectively, which attain a guarantee