9 research outputs found

    The TCLUST Approach to Robust Cluster Analysis

    Get PDF
    Producción CientíficaA new method for performing robust clustering is proposed. The method is designed with the aim of ¯tting clusters with di®erent scat- ters and weights. A proportion ® of contaminating data points is also allowed. Restrictions on the ratio between the maximum and the min- imum eigenvalues of the groups scatter matrices are introduced. These restrictions make the problem to be well-de¯ned guaranteeing the ex- istence and the consistency of the sample estimators to the population parameters.Estadística e I

    Comments on “The power of monitoring: how to make the most of a contaminated multivariate sample”

    Get PDF
    These are comments on the invited paper “The power of monitoring: How to make the most of a contaminated multivariate sample” by Andrea Cerioli, Marco Riani, Anthony Atkinson and Aldo Corbellini.Spanish Ministerio de Economía y Competitividad, grant MTM2017-86061-C2-1-P, and by Consejería de Educación de la Junta de Castilla y León and FEDER, grant VA005P17 and VA002G18

    Robustness and Outliers

    Get PDF
    Producción CientíficaUnexpected deviations from assumed models as well as the presence of certain amounts of outlying data are common in most practical statistical applications. This fact could lead to undesirable solutions when applying non-robust statistical techniques. This is often the case in cluster analysis, too. The search for homogeneous groups with large heterogeneity between them can be spoiled due to the lack of robustness of standard clustering methods. For instance, the presence of (even few) outlying observations may result in heterogeneous clusters artificially joined together or in the detection of spurious clusters merely made up of outlying observations. In this chapter we will analyze the effects of different kinds of outlying data in cluster analysis and explore several alternative methodologies designed to avoid or minimize their undesirable effects.Ministerio de Economía, Industria y Competitividad (MTM2014-56235-C2-1-P)Junta de Castilla y León (programa de apoyo a proyectos de investigación – Ref. VA212U13

    Avoiding Spurious Local Maximizers in Mixture Modeling

    Get PDF
    Producción CientíficaThe maximum likelihood estimation in the finite mixture of distributions setting is an ill-posed problem that is treatable, in practice, through the EM algorithm. However, the existence of spurious solutions (singularities and non-interesting local maximizers) makes difficult to find sensible mixture fits for non-expert practitioners. In this work, a constrained mixture fitting approach is presented with the aim of overcoming the troubles introduced by spurious solutions. Sound mathematical support is provided and, which is more relevant in practice, a feasible algorithm is also given. This algorithm allows for monitoring solutions in terms of the constant involved in the restrictions, which yields a natural way to discard spurious solutions and a valuable tool for data analysts.Estadística e I

    Exploring the number of groups in robust model-based clustering

    Get PDF
    Producción CientíficaTwo key questions in Clustering problems are how to determine the number of groups properly and measure the strength of group-assignments. These questions are specially involved when the presence of certain fraction of outlying data is also expected. Any answer to these two key questions should depend on the assumed probabilistic- model, the allowed group scatters and what we understand by noise. With this in mind, some exploratory \trimming-based" tools are presented in this work together with their justi cations. The monitoring of optimal values reached when solving a robust clustering criteria and the use of some "discriminant" factors are the basis for these exploratory tools.Estadística e I

    Grouping Around Different Dimensional Affine Subspaces

    Get PDF
    Grouping around affine subspaces and other types of manifolds is receiving a lot of attention in the literature due to its interest in several fields of application. Allowing for different dimensions is needed in many applications. This work extends the TCLUST methodology to deal with the problem of grouping data around different dimensional linear subspaces in the presence of noise. Two ways of considering error terms in the orthogonal of the linear subspaces are considered
    corecore