32,832 research outputs found

    Benchmarking in cluster analysis: A white paper

    Get PDF
    To achieve scientific progress in terms of building a cumulative body of knowledge, careful attention to benchmarking is of the utmost importance. This means that proposals of new methods of data pre-processing, new data-analytic techniques, and new methods of output post-processing, should be extensively and carefully compared with existing alternatives, and that existing methods should be subjected to neutral comparison studies. To date, benchmarking and recommendations for benchmarking have been frequently seen in the context of supervised learning. Unfortunately, there has been a dearth of guidelines for benchmarking in an unsupervised setting, with the area of clustering as an important subdomain. To address this problem, discussion is given to the theoretical conceptual underpinnings of benchmarking in the field of cluster analysis by means of simulated as well as empirical data. Subsequently, the practicalities of how to address benchmarking questions in clustering are dealt with, and foundational recommendations are made

    Nonparametric Hierarchical Clustering of Functional Data

    Full text link
    In this paper, we deal with the problem of curves clustering. We propose a nonparametric method which partitions the curves into clusters and discretizes the dimensions of the curve points into intervals. The cross-product of these partitions forms a data-grid which is obtained using a Bayesian model selection approach while making no assumptions regarding the curves. Finally, a post-processing technique, aiming at reducing the number of clusters in order to improve the interpretability of the clustering, is proposed. It consists in optimally merging the clusters step by step, which corresponds to an agglomerative hierarchical classification whose dissimilarity measure is the variation of the criterion. Interestingly this measure is none other than the sum of the Kullback-Leibler divergences between clusters distributions before and after the merges. The practical interest of the approach for functional data exploratory analysis is presented and compared with an alternative approach on an artificial and a real world data set

    A single currency for Asia? Evaluation and comparison using hierarchical and model-based cluster analysis

    Get PDF
    Today, there is increased speculation on the possibility of an Asian currency, as the region begins to show increased promise as a region of nascent economic activity. Any monetary integration scheme in East Asia would likely have to include both China and India though, so this paper attempts to assess the evolution of convergence among the East Asian countries, including China and India, according to the optimum currency area theory criteria, which is operationalized through the use of cluster analysis. In this paper we use both traditional "hierarchical" clustering as well as the more recently developed "model-based" clustering techniques and compare the outcome in each case. As the East Asian crisis of 1997-98 is likely to a¤ect the results, the exercise is done for pre-crisis, crisis, and post-crisis periods. The results reveal some structure among the countries, an increase in the degree of subregional homogeneity, and a robust relationship between Malaysia and Singapore

    From Data Topology to a Modular Classifier

    Full text link
    This article describes an approach to designing a distributed and modular neural classifier. This approach introduces a new hierarchical clustering that enables one to determine reliable regions in the representation space by exploiting supervised information. A multilayer perceptron is then associated with each of these detected clusters and charged with recognizing elements of the associated cluster while rejecting all others. The obtained global classifier is comprised of a set of cooperating neural networks and completed by a K-nearest neighbor classifier charged with treating elements rejected by all the neural networks. Experimental results for the handwritten digit recognition problem and comparison with neural and statistical nonmodular classifiers are given

    Autonomous clustering using rough set theory

    Get PDF
    This paper proposes a clustering technique that minimises the need for subjective human intervention and is based on elements of rough set theory. The proposed algorithm is unified in its approach to clustering and makes use of both local and global data properties to obtain clustering solutions. It handles single-type and mixed attribute data sets with ease and results from three data sets of single and mixed attribute types are used to illustrate the technique and establish its efficiency

    Interpretable Clustering using Unsupervised Binary Trees

    Get PDF
    We herein introduce a new method of interpretable clustering that uses unsupervised binary trees. It is a three-stage procedure, the first stage of which entails a series of recursive binary splits to reduce the heterogeneity of the data within the new subsamples. During the second stage (pruning), consideration is given to whether adjacent nodes can be aggregated. Finally, during the third stage (joining), similar clusters are joined together, even if they do not descend from the same node originally. Consistency results are obtained, and the procedure is used on simulated and real data sets.Comment: 25 pages, 6 figure
    • …
    corecore