PG-means: learning the number of clusters in data

Greg Hamerly; Yu Feng

PG-means: learning the number of clusters in data

Authors: Greg Hamerly
Yu Feng
Publication date
Publisher: MIT Press

Abstract

We present a novel algorithm called PG-means which is able to learn the number of clusters in a classical Gaussian mixture model. Our method is robust and efficient; it uses statistical hypothesis tests on one-dimensional projections of the data and model to determine if the examples are well represented by the model. In so doing, we are applying a statistical test for the entire model at once, not just on a per-cluster basis. We show that our method works well in difficult cases such as non-Gaussian data, overlapping clusters, eccentric clusters, high dimension, and many true clusters. Further, our new method provides a much more stable estimate of the number of clusters than existing methods

Similar works

Full text

Available Versions

CiteSeerX

oai:CiteSeerX.psu:10.1.1.561.3...

Last time updated on 28/10/2017

CiteSeerX

oai:CiteSeerX.psu:10.1.1.71.17...

Last time updated on 22/10/2014