The Euclidean k-means problem is a classical problem that has been
extensively studied in the theoretical computer science, machine learning and
the computational geometry communities. In this problem, we are given a set of
n points in Euclidean space Rd, and the goal is to choose k centers in
Rd so that the sum of squared distances of each point to its nearest center
is minimized. The best approximation algorithms for this problem include a
polynomial time constant factor approximation for general k and a
(1+ϵ)-approximation which runs in time poly(n)2O(k/ϵ). At
the other extreme, the only known computational complexity result for this
problem is NP-hardness [ADHP'09]. The main difficulty in obtaining hardness
results stems from the Euclidean nature of the problem, and the fact that any
point in Rd can be a potential center. This gap in understanding left open
the intriguing possibility that the problem might admit a PTAS for all k,d.
In this paper we provide the first hardness of approximation for the
Euclidean k-means problem. Concretely, we show that there exists a constant
ϵ>0 such that it is NP-hard to approximate the k-means objective
to within a factor of (1+ϵ). We show this via an efficient reduction
from the vertex cover problem on triangle-free graphs: given a triangle-free
graph, the goal is to choose the fewest number of vertices which are incident
on all the edges. Additionally, we give a proof that the current best hardness
results for vertex cover can be carried over to triangle-free graphs. To show
this we transform G, a known hard vertex cover instance, by taking a graph
product with a suitably chosen graph H, and showing that the size of the
(normalized) maximum independent set is almost exactly preserved in the product
graph using a spectral analysis, which might be of independent interest