2,174 research outputs found
Variable selection for model-based clustering using the integrated complete-data likelihood
Variable selection in cluster analysis is important yet challenging. It can
be achieved by regularization methods, which realize a trade-off between the
clustering accuracy and the number of selected variables by using a lasso-type
penalty. However, the calibration of the penalty term can suffer from
criticisms. Model selection methods are an efficient alternative, yet they
require a difficult optimization of an information criterion which involves
combinatorial problems. First, most of these optimization algorithms are based
on a suboptimal procedure (e.g. stepwise method). Second, the algorithms are
often greedy because they need multiple calls of EM algorithms. Here we propose
to use a new information criterion based on the integrated complete-data
likelihood. It does not require any estimate and its maximization is simple and
computationally efficient. The original contribution of our approach is to
perform the model selection without requiring any parameter estimation. Then,
parameter inference is needed only for the unique selected model. This approach
is used for the variable selection of a Gaussian mixture model with conditional
independence assumption. The numerical experiments on simulated and benchmark
datasets show that the proposed method often outperforms two classical
approaches for variable selection.Comment: submitted to Statistics and Computin
Benchmarking in cluster analysis: A white paper
To achieve scientific progress in terms of building a cumulative body of
knowledge, careful attention to benchmarking is of the utmost importance. This
means that proposals of new methods of data pre-processing, new data-analytic
techniques, and new methods of output post-processing, should be extensively
and carefully compared with existing alternatives, and that existing methods
should be subjected to neutral comparison studies. To date, benchmarking and
recommendations for benchmarking have been frequently seen in the context of
supervised learning. Unfortunately, there has been a dearth of guidelines for
benchmarking in an unsupervised setting, with the area of clustering as an
important subdomain. To address this problem, discussion is given to the
theoretical conceptual underpinnings of benchmarking in the field of cluster
analysis by means of simulated as well as empirical data. Subsequently, the
practicalities of how to address benchmarking questions in clustering are dealt
with, and foundational recommendations are made
Anytime Hierarchical Clustering
We propose a new anytime hierarchical clustering method that iteratively
transforms an arbitrary initial hierarchy on the configuration of measurements
along a sequence of trees we prove for a fixed data set must terminate in a
chain of nested partitions that satisfies a natural homogeneity requirement.
Each recursive step re-edits the tree so as to improve a local measure of
cluster homogeneity that is compatible with a number of commonly used (e.g.,
single, average, complete) linkage functions. As an alternative to the standard
batch algorithms, we present numerical evidence to suggest that appropriate
adaptations of this method can yield decentralized, scalable algorithms
suitable for distributed/parallel computation of clustering hierarchies and
online tracking of clustering trees applicable to large, dynamically changing
databases and anomaly detection.Comment: 13 pages, 6 figures, 5 tables, in preparation for submission to a
conferenc
Structural Quantification of Entanglement
We introduce an approach which allows a detailed structural and quantitative
analysis of multipartite entanglement. The sets of states with different
structures are convex and nested. Hence, they can be distinguished from each
other using appropriate measurable witnesses. We derive equations for the
construction of optimal witnesses and discuss general properties arising from
our approach. As an example, we formulate witnesses for a 4-cluster state and
perform a full quantitative analysis of the entanglement structure in the
presence of noise and losses. The strength of the method in multimode
continuous variable systems is also demonstrated by considering a dephased
GHZ-type state.Comment: 12 pages, 1 table and 3 figure
- …