Search CORE

2,174 research outputs found

Variable selection for model-based clustering using the integrated complete-data likelihood

Author: Matthieu Marbac
Mohammed Sedki
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 26/01/2015
Field of study

Variable selection in cluster analysis is important yet challenging. It can be achieved by regularization methods, which realize a trade-off between the clustering accuracy and the number of selected variables by using a lasso-type penalty. However, the calibration of the penalty term can suffer from criticisms. Model selection methods are an efficient alternative, yet they require a difficult optimization of an information criterion which involves combinatorial problems. First, most of these optimization algorithms are based on a suboptimal procedure (e.g. stepwise method). Second, the algorithms are often greedy because they need multiple calls of EM algorithms. Here we propose to use a new information criterion based on the integrated complete-data likelihood. It does not require any estimate and its maximization is simple and computationally efficient. The original contribution of our approach is to perform the model selection without requiring any parameter estimation. Then, parameter inference is needed only for the unique selected model. This approach is used for the variable selection of a Gaussian mixture model with conditional independence assumption. The numerical experiments on simulated and benchmark datasets show that the proposed method often outperforms two classical approaches for variable selection.Comment: submitted to Statistics and Computin

arXiv.org e-Print Archive

HAL Descartes

HAL UVSQ

Benchmarking in cluster analysis: A white paper

Author: Boulesteix Anne-Laure
Dangl Rainer
Dean Nema
Guyon Isabelle
Hennig Christian
Leisch Friedrich
Steinley Douglas
Van Mechelen Iven
Publication venue
Publication date: 01/10/2018
Field of study

To achieve scientific progress in terms of building a cumulative body of knowledge, careful attention to benchmarking is of the utmost importance. This means that proposals of new methods of data pre-processing, new data-analytic techniques, and new methods of output post-processing, should be extensively and carefully compared with existing alternatives, and that existing methods should be subjected to neutral comparison studies. To date, benchmarking and recommendations for benchmarking have been frequently seen in the context of supervised learning. Unfortunately, there has been a dearth of guidelines for benchmarking in an unsupervised setting, with the area of clustering as an important subdomain. To address this problem, discussion is given to the theoretical conceptual underpinnings of benchmarking in the field of cluster analysis by means of simulated as well as empirical data. Subsequently, the practicalities of how to address benchmarking questions in clustering are dealt with, and foundational recommendations are made

arXiv.org e-Print Archive

Proceedings - University of Groningen

ARTS repository - University of Groningen

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

Enlighten

Dissertations of the University of Groningen

Anytime Hierarchical Clustering

Author: Arslan Omur
Koditschek Daniel E.
Publication venue
Publication date: 13/04/2014
Field of study

We propose a new anytime hierarchical clustering method that iteratively transforms an arbitrary initial hierarchy on the configuration of measurements along a sequence of trees we prove for a fixed data set must terminate in a chain of nested partitions that satisfies a natural homogeneity requirement. Each recursive step re-edits the tree so as to improve a local measure of cluster homogeneity that is compatible with a number of commonly used (e.g., single, average, complete) linkage functions. As an alternative to the standard batch algorithms, we present numerical evidence to suggest that appropriate adaptations of this method can yield decentralized, scalable algorithms suitable for distributed/parallel computation of clustering hierarchies and online tracking of clustering trees applicable to large, dynamically changing databases and anomaly detection.Comment: 13 pages, 6 figures, 5 tables, in preparation for submission to a conferenc

arXiv.org e-Print Archive

CiteSeerX

ScholarlyCommons@Penn

Structural Quantification of Entanglement

Author: Shahandeh F.
Sperling J.
Vogel W.
Publication venue: 'American Physical Society (APS)'
Publication date: 17/07/2014
Field of study

We introduce an approach which allows a detailed structural and quantitative analysis of multipartite entanglement. The sets of states with different structures are convex and nested. Hence, they can be distinguished from each other using appropriate measurable witnesses. We derive equations for the construction of optimal witnesses and discuss general properties arising from our approach. As an example, we formulate witnesses for a 4-cluster state and perform a full quantitative analysis of the entanglement structure in the presence of noise and losses. The strength of the method in multimode continuous variable systems is also demonstrated by considering a dephased GHZ-type state.Comment: 12 pages, 1 table and 3 figure

arXiv.org e-Print Archive

University of Queensland eSpace

Block-Diagonal Forms of Distance Matrices for Partition Based Image Retrieval

Author: Dmitry Kinoshenko
Elena Yegorova
Vladimir Mashtalir
Publication venue: 'IntechOpen'
Publication date: 01/02/2010
Field of study

IntechOpen