A tight lower bound instance for k-means++ in constant dimension

A. Aggarwal; B. Bahmani; D. Arthur; D. Arthur; M. Agarwal; M.R. Ackermann; R. Jaiswal

research

A tight lower bound instance for k-means++ in constant dimension

Authors: A. Aggarwal
B. Bahmani
D. Arthur
D. Arthur
M. Agarwal
M.R. Ackermann
R. Jaiswal
Publication date: 1 January 2014
Publisher
Doi

Abstract

The k-means++ seeding algorithm is one of the most popular algorithms that is used for finding the initial

k

centers when using the k-means heuristic. The algorithm is a simple sampling procedure and can be described as follows: Pick the first center randomly from the given points. For

i > 1

, pick a point to be the

i^{th}

center with probability proportional to the square of the Euclidean distance of this point to the closest previously

(i-1)

chosen centers. The k-means++ seeding algorithm is not only simple and fast but also gives an

O(\log{k})

approximation in expectation as shown by Arthur and Vassilvitskii. There are datasets on which this seeding algorithm gives an approximation factor of

\Omega(\log{k})

in expectation. However, it is not clear from these results if the algorithm achieves good approximation factor with reasonably high probability (say

1/poly(k)

). Brunsch and R\"{o}glin gave a dataset where the k-means++ seeding algorithm achieves an

O(\log{k})

approximation ratio with probability that is exponentially small in

k

. However, this and all other known lower-bound examples are high dimensional. So, an open problem was to understand the behavior of the algorithm on low dimensional datasets. In this work, we give a simple two dimensional dataset on which the seeding algorithm achieves an

O(\log{k})

approximation ratio with probability exponentially small in

k

. This solves open problems posed by Mahajan et al. and by Brunsch and R\"{o}glin.Comment: To appear in TAMC 2014. arXiv admin note: text overlap with arXiv:1306.420

Similar works

Full text

Available Versions

Crossref

info:doi/10.1007%2F978-3-319-0...

Last time updated on 01/04/2019

CiteSeerX

oai:CiteSeerX.psu:10.1.1.755.8...

Last time updated on 30/10/2017