Search CORE

32,832 research outputs found

Benchmarking in cluster analysis: A white paper

Author: Boulesteix Anne-Laure
Dangl Rainer
Dean Nema
Guyon Isabelle
Hennig Christian
Leisch Friedrich
Steinley Douglas
Van Mechelen Iven
Publication venue
Publication date: 01/10/2018
Field of study

To achieve scientific progress in terms of building a cumulative body of knowledge, careful attention to benchmarking is of the utmost importance. This means that proposals of new methods of data pre-processing, new data-analytic techniques, and new methods of output post-processing, should be extensively and carefully compared with existing alternatives, and that existing methods should be subjected to neutral comparison studies. To date, benchmarking and recommendations for benchmarking have been frequently seen in the context of supervised learning. Unfortunately, there has been a dearth of guidelines for benchmarking in an unsupervised setting, with the area of clustering as an important subdomain. To address this problem, discussion is given to the theoretical conceptual underpinnings of benchmarking in the field of cluster analysis by means of simulated as well as empirical data. Subsequently, the practicalities of how to address benchmarking questions in clustering are dealt with, and foundational recommendations are made

arXiv.org e-Print Archive

Proceedings - University of Groningen

ARTS repository - University of Groningen

Enlighten

Dissertations of the University of Groningen

Nonparametric Hierarchical Clustering of Functional Data

Author: C. Abraham
D.M. Blei
F. Chamroukhi
G. Delaigle
G. Hébrail
J. Rissanen
M. Abramowitz
P. Hansen
R.M. Neal
T. Cover
T. Gasser
X. Nguyen
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2014
Field of study

In this paper, we deal with the problem of curves clustering. We propose a nonparametric method which partitions the curves into clusters and discretizes the dimensions of the curve points into intervals. The cross-product of these partitions forms a data-grid which is obtained using a Bayesian model selection approach while making no assumptions regarding the curves. Finally, a post-processing technique, aiming at reducing the number of clusters in order to improve the interpretability of the clustering, is proposed. It consists in optimally merging the clusters step by step, which corresponds to an agglomerative hierarchical classification whose dissimilarity measure is the variation of the criterion. Interestingly this measure is none other than the sum of the Kullback-Leibler divergences between clusters distributions before and after the merges. The practical interest of the approach for functional data exploratory analysis is presented and compared with an alternative approach on an artificial and a real world data set

arXiv.org e-Print Archive

Crossref

HAL-Paris1

A single currency for Asia? Evaluation and comparison using hierarchical and model-based cluster analysis

Author: Crowley Patrick M.
Quah Chee-Heong
Publication venue
Publication date: 01/01/2009
Field of study

Today, there is increased speculation on the possibility of an Asian currency, as the region begins to show increased promise as a region of nascent economic activity. Any monetary integration scheme in East Asia would likely have to include both China and India though, so this paper attempts to assess the evolution of convergence among the East Asian countries, including China and India, according to the optimum currency area theory criteria, which is operationalized through the use of cluster analysis. In this paper we use both traditional "hierarchical" clustering as well as the more recently developed "model-based" clustering techniques and compare the outcome in each case. As the East Asian crisis of 1997-98 is likely to a¤ect the results, the exercise is done for pre-crisis, crisis, and post-crisis periods. The results reveal some structure among the countries, an increase in the degree of subregional homogeneity, and a robust relationship between Malaysia and Singapore

Archive of European Integration

From Data Topology to a Modular Classifier

Author: Ennaji Abdel
Lecourtier Yves
Ribert Arnaud
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2003
Field of study

This article describes an approach to designing a distributed and modular neural classifier. This approach introduces a new hierarchical clustering that enables one to determine reliable regions in the representation space by exploiting supervised information. A multilayer perceptron is then associated with each of these detected clusters and charged with recognizing elements of the associated cluster while rejecting all others. The obtained global classifier is comprised of a set of cooperating neural networks and completed by a K-nearest neighbor classifier charged with treating elements rejected by all the neural networks. Experimental results for the handwritten digit recognition problem and comparison with neural and statistical nonmodular classifiers are given

arXiv.org e-Print Archive

HAL - Normandie Université

Crossref

Autonomous clustering using rough set theory

Author: A. K. Jain
A. K. Jain
A. Skowron
B. J. F. Manly
B. S. Everitt
C. L. Bean
C. L. Bean
Chandra Kambhampati
Charlotte Bean
D. Dubois
E. W. Forgey
F. H. C. Marriott
F. Höppner
G. H. Ball
J. A. Hartigan
J. B. MacQueen
J. C. Bezdek
J. C. Dunn
J. H. Ward
J. Komorowski
J. S. R. Jang
M. R. Anderberg
M. S. Aldenderfer
M. S. Kamel
P. Sneath
R. C. Jancey
R. R. Sokal
R. R. Yegar
S. Sharma
S. Z. Selim
T. Okuzaki
T. Sorensen
Z. Pawlak
Z. Pawlak
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2008
Field of study

This paper proposes a clustering technique that minimises the need for subjective human intervention and is based on elements of rough set theory. The proposed algorithm is unified in its approach to clustering and makes use of both local and global data properties to obtain clustering solutions. It handles single-type and mixed attribute data sets with ease and results from three data sets of single and mixed attribute types are used to illustrate the technique and establish its efficiency

Repository@Hull - Worktribe

Crossref

Warwick Research Archives Portal Repository

Interpretable Clustering using Unsupervised Binary Trees

Author: Fraiman Ricardo
Ghattas Badih
Svarc Marcela
Publication venue
Publication date: 01/01/2011
Field of study

We herein introduce a new method of interpretable clustering that uses unsupervised binary trees. It is a three-stage procedure, the first stage of which entails a series of recursive binary splits to reduce the heterogeneity of the data within the new subsamples. During the second stage (pruning), consideration is given to whether adjacent nodes can be aggregated. Finally, during the third stage (joining), similar clusters are joined together, even if they do not descend from the same node originally. Consistency results are obtained, and the procedure is used on simulated and real data sets.Comment: 25 pages, 6 figure

arXiv.org e-Print Archive

Biblioteca Max von Buch, Universidad de San Andrés