Search CORE

4,673 research outputs found

Dynamic Clustering of Histogram Data Based on Adaptive Squared Wasserstein Distances

Author: Ahmad
Antonio Irpino
Bock
Calinski
Calo
Celeux
Chan
Chen
Clark
Cuesta-Albertos
De Carvalho
De Carvalho
De Carvalho
De Carvalho
De Souza
Deng
Diday
Diday
Francisco de A.T. De Carvalho
Friedman
Frigui
Gibbs
Huang
Hubert
Irpino
Irpino
Jain
Jing
Johnson
Levina
Mallows
Milligan
Rosanna Verde
Rubner
Rüshendorff
Terada
Tsai
Verde
Verde
Verde
Villani
Vrac
Xu
Publication venue: 'Elsevier BV'
Publication date: 07/10/2011
Field of study

This paper deals with clustering methods based on adaptive distances for histogram data using a dynamic clustering algorithm. Histogram data describes individuals in terms of empirical distributions. These kind of data can be considered as complex descriptions of phenomena observed on complex objects: images, groups of individuals, spatial or temporal variant data, results of queries, environmental data, and so on. The Wasserstein distance is used to compare two histograms. The Wasserstein distance between histograms is constituted by two components: the first based on the means, and the second, to internal dispersions (standard deviation, skewness, kurtosis, and so on) of the histograms. To cluster sets of histogram data, we propose to use Dynamic Clustering Algorithm, (based on adaptive squared Wasserstein distances) that is a k-means-like algorithm for clustering a set of individuals into

K

classes that are apriori fixed. The main aim of this research is to provide a tool for clustering histograms, emphasizing the different contributions of the histogram variables, and their components, to the definition of the clusters. We demonstrate that this can be achieved using adaptive distances. Two kind of adaptive distances are considered: the first takes into account the variability of each component of each descriptor for the whole set of individuals; the second takes into account the variability of each component of each descriptor in each cluster. We furnish interpretative tools of the obtained partition based on an extension of the classical measures (indexes) to the use of adaptive distances in the clustering criterion function. Applications on synthetic and real-world data corroborate the proposed procedure

arXiv.org e-Print Archive

3rd Workshop in Symbolic Data Analysis: book of abstracts

Author: Arroyo Gallardo Javier, ed.
Brito Paula, ed.
Maté Carlos, ed.
Noirhomme-Fraiture Monique, ed.
Publication venue
Publication date: 01/01/2012
Field of study

This workshop is the third regular meeting of researchers interested in Symbolic Data Analysis. The main aim of the event is to favor the meeting of people and the exchange of ideas from different fields - Mathematics, Statistics, Computer Science, Engineering, Economics, among others - that contribute to Symbolic Data Analysis

Fuzzy C-ordered medoids clustering of interval-valued data

Author: D'URSO Pierpaolo
Jacek Leski
Publication venue: 'Elsevier BV'
Publication date: 01/01/2016
Field of study

Fuzzy clustering for interval-valued data helps us to find natural vague boundaries in such data. The Fuzzy c-Medoids Clustering (FcMdC) method is one of the most popular clustering methods based on a partitioning around medoids approach. However, one of the greatest disadvantages of this method is its sensitivity to the presence of outliers in data. This paper introduces a new robust fuzzy clustering method named Fuzzy c-Ordered-Medoids clustering for interval-valued data (FcOMdC-ID). The Huber's M-estimators and the Yager's Ordered Weighted Averaging (OWA) operators are used in the method proposed to make it robust to outliers. The described algorithm is compared with the fuzzy c-medoids method in the experiments performed on synthetic data with different types of outliers. A real application of the FcOMdC-ID is also provided

Archivio della ricerca- Università di Roma La Sapienza

Designing labeled graph classifiers by exploiting the R\'enyi entropy of the dissimilarity representation

Author: Livi Lorenzo
Publication venue: 'MDPI AG'
Publication date: 20/04/2017
Field of study

Representing patterns as labeled graphs is becoming increasingly common in the broad field of computational intelligence. Accordingly, a wide repertoire of pattern recognition tools, such as classifiers and knowledge discovery procedures, are nowadays available and tested for various datasets of labeled graphs. However, the design of effective learning procedures operating in the space of labeled graphs is still a challenging problem, especially from the computational complexity viewpoint. In this paper, we present a major improvement of a general-purpose classifier for graphs, which is conceived on an interplay between dissimilarity representation, clustering, information-theoretic techniques, and evolutionary optimization algorithms. The improvement focuses on a specific key subroutine devised to compress the input data. We prove different theorems which are fundamental to the setting of the parameters controlling such a compression operation. We demonstrate the effectiveness of the resulting classifier by benchmarking the developed variants on well-known datasets of labeled graphs, considering as distinct performance indicators the classification accuracy, computing time, and parsimony in terms of structural complexity of the synthesized classification models. The results show state-of-the-art standards in terms of test set accuracy and a considerable speed-up for what concerns the computing time.Comment: Revised versio

arXiv.org e-Print Archive

Directory of Open Access Journals