A Comparative Study of Efficient Initialization Methods for the K-Means
  Clustering Algorithm

Al Hasan; Al-Daoud; Aloise; Aloise; Anderberg; Babu; Babu; Ball; Bei; Bergmann; Bottou; Breunig; Cao; Celebi; Chen; Chen; Daniel; Forgy; Friedman; Garcia; Garcia; Gonzalez; Hartigan; Hassan A. Kingravi; Hotelling; Huang; Huang; Hubert; Hyvärinen; Iman; Jain; Jain; Jancey; Kanungo; Katsavounidis; Kaufman; Lance; Likas; Linde; Lloyd; Lu; Luengo; M. Emre Celebi; Maitra; Mao; Matsumoto; Meilă; Milligan; Milligan; Norušis; Onoda; Ordonez; Pal; Patricio A. Vela; Pena; Redmond; Selim; Späth; Su; Tarsitano; Tou; Wu; Zhang

research

A Comparative Study of Efficient Initialization Methods for the K-Means Clustering Algorithm

Authors: Al Hasan
Al-Daoud
Aloise
Aloise
Anderberg
Babu
Babu
Ball
Bei
Bergmann
Bottou
Breunig
Cao
Celebi
Chen
Chen
Daniel
Forgy
Friedman
Garcia
Garcia
Gonzalez
Hartigan
Hassan A. Kingravi
Hotelling
Huang
Huang
Hubert
Hyvärinen
Iman
Jain
Jain
Jancey
Kanungo
Katsavounidis
Kaufman
Lance
Likas
Linde
Lloyd
Lu
Luengo
M. Emre Celebi
Maitra
Mao
Matsumoto
Meilă
Milligan
Milligan
Norušis
Onoda
Ordonez
Pal
Patricio A. Vela
Pena
Redmond
Selim
Späth
Su
Tarsitano
Tou
Wu
Zhang
Publication date: 10 September 2012
Publisher: 'Elsevier BV'
Doi

Abstract

K-means is undoubtedly the most widely used partitional clustering algorithm. Unfortunately, due to its gradient descent nature, this algorithm is highly sensitive to the initial placement of the cluster centers. Numerous initialization methods have been proposed to address this problem. In this paper, we first present an overview of these methods with an emphasis on their computational efficiency. We then compare eight commonly used linear time complexity initialization methods on a large and diverse collection of data sets using various performance criteria. Finally, we analyze the experimental results using non-parametric statistical tests and provide recommendations for practitioners. We demonstrate that popular initialization methods often perform poorly and that there are in fact strong alternatives to these methods.Comment: 17 pages, 1 figure, 7 table

Similar works

Full text

Available Versions

Crossref

info:doi/10.1016%2Fj.eswa.2012...

Last time updated on 10/12/2019