2,174 research outputs found

    Variable selection for model-based clustering using the integrated complete-data likelihood

    Full text link
    Variable selection in cluster analysis is important yet challenging. It can be achieved by regularization methods, which realize a trade-off between the clustering accuracy and the number of selected variables by using a lasso-type penalty. However, the calibration of the penalty term can suffer from criticisms. Model selection methods are an efficient alternative, yet they require a difficult optimization of an information criterion which involves combinatorial problems. First, most of these optimization algorithms are based on a suboptimal procedure (e.g. stepwise method). Second, the algorithms are often greedy because they need multiple calls of EM algorithms. Here we propose to use a new information criterion based on the integrated complete-data likelihood. It does not require any estimate and its maximization is simple and computationally efficient. The original contribution of our approach is to perform the model selection without requiring any parameter estimation. Then, parameter inference is needed only for the unique selected model. This approach is used for the variable selection of a Gaussian mixture model with conditional independence assumption. The numerical experiments on simulated and benchmark datasets show that the proposed method often outperforms two classical approaches for variable selection.Comment: submitted to Statistics and Computin

    Benchmarking in cluster analysis: A white paper

    Get PDF
    To achieve scientific progress in terms of building a cumulative body of knowledge, careful attention to benchmarking is of the utmost importance. This means that proposals of new methods of data pre-processing, new data-analytic techniques, and new methods of output post-processing, should be extensively and carefully compared with existing alternatives, and that existing methods should be subjected to neutral comparison studies. To date, benchmarking and recommendations for benchmarking have been frequently seen in the context of supervised learning. Unfortunately, there has been a dearth of guidelines for benchmarking in an unsupervised setting, with the area of clustering as an important subdomain. To address this problem, discussion is given to the theoretical conceptual underpinnings of benchmarking in the field of cluster analysis by means of simulated as well as empirical data. Subsequently, the practicalities of how to address benchmarking questions in clustering are dealt with, and foundational recommendations are made

    Anytime Hierarchical Clustering

    Get PDF
    We propose a new anytime hierarchical clustering method that iteratively transforms an arbitrary initial hierarchy on the configuration of measurements along a sequence of trees we prove for a fixed data set must terminate in a chain of nested partitions that satisfies a natural homogeneity requirement. Each recursive step re-edits the tree so as to improve a local measure of cluster homogeneity that is compatible with a number of commonly used (e.g., single, average, complete) linkage functions. As an alternative to the standard batch algorithms, we present numerical evidence to suggest that appropriate adaptations of this method can yield decentralized, scalable algorithms suitable for distributed/parallel computation of clustering hierarchies and online tracking of clustering trees applicable to large, dynamically changing databases and anomaly detection.Comment: 13 pages, 6 figures, 5 tables, in preparation for submission to a conferenc

    Structural Quantification of Entanglement

    Full text link
    We introduce an approach which allows a detailed structural and quantitative analysis of multipartite entanglement. The sets of states with different structures are convex and nested. Hence, they can be distinguished from each other using appropriate measurable witnesses. We derive equations for the construction of optimal witnesses and discuss general properties arising from our approach. As an example, we formulate witnesses for a 4-cluster state and perform a full quantitative analysis of the entanglement structure in the presence of noise and losses. The strength of the method in multimode continuous variable systems is also demonstrated by considering a dephased GHZ-type state.Comment: 12 pages, 1 table and 3 figure
    • …
    corecore