Many quantum algorithms for machine learning require access to classical data
in superposition. However, for many natural data sets and algorithms, the
overhead required to load the data set in superposition can erase any potential
quantum speedup over classical algorithms. Recent work by Harrow introduces a
new paradigm in hybrid quantum-classical computing to address this issue,
relying on coresets to minimize the data loading overhead of quantum
algorithms. We investigate using this paradigm to perform k-means clustering
on near-term quantum computers, by casting it as a QAOA optimization instance
over a small coreset. We compare the performance of this approach to classical
k-means clustering both numerically and experimentally on IBM Q hardware. We
are able to find data sets where coresets work well relative to random sampling
and where QAOA could potentially outperform standard k-means on a coreset.
However, finding data sets where both coresets and QAOA work well--which is
necessary for a quantum advantage over k-means on the entire data
set--appears to be challenging