1 research outputs found

    An Aposteriorical Clusterability Criterion for kk-Means++ and Simplicity of Clustering

    Full text link
    We define the notion of a well-clusterable data set combining the point of view of the objective of kk-means clustering algorithm (minimising the centric spread of data elements) and common sense (clusters shall be separated by gaps). We identify conditions under which the optimum of kk-means objective coincides with a clustering under which the data is separated by predefined gaps. We investigate two cases: when the whole clusters are separated by some gap and when only the cores of the clusters meet some separation condition. We overcome a major obstacle in using clusterability criteria due to the fact that known approaches to clusterability checking had the disadvantage that they are related to the optimal clustering which is NP hard to identify. Compared to other approaches to clusterability, the novelty consists in the possibility of an a posteriori (after running kk-means) check if the data set is well-clusterable or not. As the kk-means algorithm applied for this purpose has polynomial complexity so does therefore the appropriate check. Additionally, if kk-means++ fails to identify a clustering that meets clusterability criteria, with high probability the data is not well-clusterable.Comment: 58 page
    corecore