1 research outputs found
An Aposteriorical Clusterability Criterion for -Means++ and Simplicity of Clustering
We define the notion of a well-clusterable data set combining the point of
view of the objective of -means clustering algorithm (minimising the centric
spread of data elements) and common sense (clusters shall be separated by
gaps). We identify conditions under which the optimum of -means objective
coincides with a clustering under which the data is separated by predefined
gaps.
We investigate two cases: when the whole clusters are separated by some gap
and when only the cores of the clusters meet some separation condition.
We overcome a major obstacle in using clusterability criteria due to the fact
that known approaches to clusterability checking had the disadvantage that they
are related to the optimal clustering which is NP hard to identify.
Compared to other approaches to clusterability, the novelty consists in the
possibility of an a posteriori (after running -means) check if the data set
is well-clusterable or not. As the -means algorithm applied for this purpose
has polynomial complexity so does therefore the appropriate check.
Additionally, if -means++ fails to identify a clustering that meets
clusterability criteria, with high probability the data is not
well-clusterable.Comment: 58 page