We give a quantum approximation scheme (i.e., (1+ε)-approximation for every ε>0) for the classical
k-means clustering problem in the QRAM model with a running time that has
only polylogarithmic dependence on the number of data points. More
specifically, given a dataset V with N points in Rd stored in
QRAM data structure, our quantum algorithm runs in time O~(2O~(εk​)η2d) and with high probability
outputs a set C of k centers such that cost(V,C)≤(1+ε)⋅cost(V,COPT​). Here COPT​ denotes the optimal k-centers,
cost(.) denotes the standard k-means cost function (i.e., the sum of the
squared distance of points to the closest center), and η is the aspect
ratio (i.e., the ratio of maximum distance to minimum distance). This is the
first quantum algorithm with a polylogarithmic running time that gives a
provable approximation guarantee of (1+ε) for the k-means
problem. Also, unlike previous works on unsupervised learning, our quantum
algorithm does not require quantum linear algebra subroutines and has a running
time independent of parameters (e.g., condition number) that appear in such
procedures