4 research outputs found
A global Approach to the Comparison of Clustering Results
Copyright © 2012 Walter de Gruyter GmbH.The discovery of knowledge in the case of Hierarchical Cluster Analysis (HCA) depends on many factors, such as the clustering algorithms applied and the strategies developed in the initialstage of Cluster Analysis. We present a global approach for evaluating the quality of clustering results and making a comparison among different clustering algorithms using the relevant information available (e.g. the stability, isolation and homogeneity of the clusters). In addition, we present a visual method to facilitate evaluation of the quality of the partitions, allowing identification of the similarities and differences between partitions, as well as the behaviour of the elements in the partitions. We illustrate our approach using a complex and heterogeneous dataset (real horse data) taken from the literature. We apply HCA based on the generalized affinity coefficient (similarity coefficient) to the case of complex data (symbolic data), combined with 26 (classic and probabilistic) clustering algorithms. Finally, we discuss the obtained results and the contribution of this approach to gaining better knowledge of the structure of data
Dynamic Clustering of Histogram Data Based on Adaptive Squared Wasserstein Distances
This paper deals with clustering methods based on adaptive distances for
histogram data using a dynamic clustering algorithm. Histogram data describes
individuals in terms of empirical distributions. These kind of data can be
considered as complex descriptions of phenomena observed on complex objects:
images, groups of individuals, spatial or temporal variant data, results of
queries, environmental data, and so on. The Wasserstein distance is used to
compare two histograms. The Wasserstein distance between histograms is
constituted by two components: the first based on the means, and the second, to
internal dispersions (standard deviation, skewness, kurtosis, and so on) of the
histograms. To cluster sets of histogram data, we propose to use Dynamic
Clustering Algorithm, (based on adaptive squared Wasserstein distances) that is
a k-means-like algorithm for clustering a set of individuals into classes
that are apriori fixed.
The main aim of this research is to provide a tool for clustering histograms,
emphasizing the different contributions of the histogram variables, and their
components, to the definition of the clusters. We demonstrate that this can be
achieved using adaptive distances. Two kind of adaptive distances are
considered: the first takes into account the variability of each component of
each descriptor for the whole set of individuals; the second takes into account
the variability of each component of each descriptor in each cluster. We
furnish interpretative tools of the obtained partition based on an extension of
the classical measures (indexes) to the use of adaptive distances in the
clustering criterion function. Applications on synthetic and real-world data
corroborate the proposed procedure