Search CORE

7 research outputs found

Dynamic Clustering of Histogram Data Based on Adaptive Squared Wasserstein Distances

Author: Ahmad
Antonio Irpino
Bock
Calinski
Calo
Celeux
Chan
Chen
Clark
Cuesta-Albertos
De Carvalho
De Carvalho
De Carvalho
De Carvalho
De Souza
Deng
Diday
Diday
Francisco de A.T. De Carvalho
Friedman
Frigui
Gibbs
Huang
Hubert
Irpino
Irpino
Jain
Jing
Johnson
Levina
Mallows
Milligan
Rosanna Verde
Rubner
Rüshendorff
Terada
Tsai
Verde
Verde
Verde
Villani
Vrac
Xu
Publication venue: 'Elsevier BV'
Publication date: 07/10/2011
Field of study

This paper deals with clustering methods based on adaptive distances for histogram data using a dynamic clustering algorithm. Histogram data describes individuals in terms of empirical distributions. These kind of data can be considered as complex descriptions of phenomena observed on complex objects: images, groups of individuals, spatial or temporal variant data, results of queries, environmental data, and so on. The Wasserstein distance is used to compare two histograms. The Wasserstein distance between histograms is constituted by two components: the first based on the means, and the second, to internal dispersions (standard deviation, skewness, kurtosis, and so on) of the histograms. To cluster sets of histogram data, we propose to use Dynamic Clustering Algorithm, (based on adaptive squared Wasserstein distances) that is a k-means-like algorithm for clustering a set of individuals into

K

classes that are apriori fixed. The main aim of this research is to provide a tool for clustering histograms, emphasizing the different contributions of the histogram variables, and their components, to the definition of the clusters. We demonstrate that this can be achieved using adaptive distances. Two kind of adaptive distances are considered: the first takes into account the variability of each component of each descriptor for the whole set of individuals; the second takes into account the variability of each component of each descriptor in each cluster. We furnish interpretative tools of the obtained partition based on an extension of the classical measures (indexes) to the use of adaptive distances in the clustering criterion function. Applications on synthetic and real-world data corroborate the proposed procedure

arXiv.org e-Print Archive

Crossref

Archivio Istituzionale della Ricerca - Università degli Studi della Campania "Luigi Vanvitelli"

On the computation of Wasserstein barycenters

Author: G. Puccetti
L. Ruschendorf
S. Vanduffel
Publication venue: 'Elsevier BV'
Publication date: 01/03/2020
Field of study

The Wasserstein barycenter is an important notion in the analysis of high dimensional data with a broad range of applications in applied probability, economics, statistics, and in particular to clustering and image processing. In this paper, we state a general version of the equivalence of the Wasserstein barycenter problem to the n-coupling problem. As a consequence, the coupling to the sum principle (characterizing solutions to the n-coupling problem) provides a novel criterion for the explicit characterization of barycenters. Based on this criterion, we provide as a main contribution the simple to implement iterative swapping algorithm (ISA) for computing barycenters. The ISA is a completely non-parametric algorithm which provides a sharp image of the support of the barycenter and has a quadratic time complexity which is comparable to other well established algorithms designed to compute barycenters. The algorithm can also be applied to more complex optimization problems like the k-barycenter problem

AIR Universita degli studi di Milano

Dynamic clustering of histogram data based on adaptive squared Wasserstein distances

Author: De Carvalho Francisco
IRPINO Antonio
VERDE Rosanna
Publication venue: 'Elsevier BV'
Publication date: 01/01/2014
Field of study

This paper presents a Dynamic Clustering Algorithm for histogram data with an automatic weighting step of the variables by using adaptive distances. The Dynamic Clustering Algorithm is a k-means-like algorithm for clustering a set of objects into a predefined number of classes. Histogram data are realizations of particular set-valued descriptors defined in the context of Symbolic Data Analysis. We propose to use the ℓ2ℓ2 Wasserstein distance for clustering histogram data and two novel adaptive distance based clustering schemes. The ℓ2ℓ2 Wasserstein distance allows to express the variability of a set of histograms in two components: the first related to the variability of their averages and the second to the variability of the histograms related to different size and shape. The weighting step aims to take into account global and local adaptive distances as well as two components of the variability of a set of histograms. To evaluate the clustering results, we extend some classic partition quality indexes when the proposed adaptive distances are used in the clustering criterion function. Examples on synthetic and real-world datasets corroborate the proposed clustering procedur

Archivio Istituzionale della Ricerca - Università degli Studi della Campania "Luigi Vanvitelli"