98 research outputs found

    The joint role of trimming and constraints in robust estimation for mixtures of Gaussian factor analyzers.

    Get PDF
    Producción CientíficaMixtures of Gaussian factors are powerful tools for modeling an unobserved heterogeneous population, offering – at the same time – dimension reduction and model-based clustering. The high prevalence of spurious solutions and the disturbing effects of outlying observations in maximum likelihood estimation may cause biased or misleading inferences. Restrictions for the component covariances are considered in order to avoid spurious solutions, and trimming is also adopted, to provide robustness against violations of normality assumptions of the underlying latent factors. A detailed AECM algorithm for this new approach is presented. Simulation results and an application to the AIS dataset show the aim and effectiveness of the proposed methodology.Ministerio de Economía y Competitividad and FEDER, grant MTM2014-56235-C2-1-P, and by Consejería de Educación de la Junta de Castilla y León, grant VA212U13, by grant FAR 2015 from the University of Milano-Bicocca and by grant FIR 2014 from the University of Catania

    The joint role of trimming and constraints in robust estimation for mixtures of Gaussian factor analyzers.

    Get PDF
    Producción CientíficaMixtures of Gaussian factors are powerful tools for modeling an unobserved heterogeneous population, offering – at the same time – dimension reduction and model-based clustering. The high prevalence of spurious solutions and the disturbing effects of outlying observations in maximum likelihood estimation may cause biased or misleading inferences. Restrictions for the component covariances are considered in order to avoid spurious solutions, and trimming is also adopted, to provide robustness against violations of normality assumptions of the underlying latent factors. A detailed AECM algorithm for this new approach is presented. Simulation results and an application to the AIS dataset show the aim and effectiveness of the proposed methodology.Ministerio de Economía y Competitividad and FEDER, grant MTM2014-56235-C2-1-P, and by Consejería de Educación de la Junta de Castilla y León, grant VA212U13, by grant FAR 2015 from the University of Milano-Bicocca and by grant FIR 2014 from the University of Catania

    Robust, fuzzy, and parsimonious clustering based on mixtures of Factor Analyzers

    Get PDF
    A clustering algorithm that combines the advantages of fuzzy clustering and robust statistical estimators is presented. It is based on mixtures of Factor Analyzers, endowed by the joint usage of trimming and the constrained estimation of scatter matrices, in a modified maximum likelihood approach. The algorithm generates a set of membership values, that are used to fuzzy partition the data set and to contribute to the robust estimates of the mixture parameters. The adoption of clusters modeled by Gaussian Factor Analysis allows for dimension reduction and for discovering local linear structures in the data. The new methodology has been shown to be resistant to different types of contamination, by applying it on artificial data. A brief discussion on the tuning parameters, such as the trimming level, the fuzzifier parameter, the number of clusters and the value of the scatter matrices constraint, has been developed, also with the help of some heuristic tools for their choice. Finally, a real data set has been analyzed, to show how intermediate membership values are estimated for observations lying at cluster overlap, while cluster cores are composed by observations that are assigned to a cluster in a crisp way.Ministerio de Economía y Competitividad grant MTM2017-86061-C2-1-P, y Consejería de Educación de la Junta de Castilla y León and FEDER grantVA005P17 y VA002G1

    Graphical and computational tools to guide parameter choice for the cluster weighted robust model

    Get PDF
    The Cluster Weighted Robust Model (CWRM) is a recently introduced methodology to robustly estimate mixtures of regressions with random covariates. The CWRM allows users to flexibly perform regression clustering, safeguarding it against data contamination and spurious solutions. Nonetheless, the resulting solution depends on the chosen number of components in the mixture, the percentage of impartial trimming, the degree of heteroscedasticity of the errors around the regression lines and of the clusters in the explanatory variables. Therefore an appropriate model selection is crucially required. Such a complex modeling task may generate several “legitimate” solutions: each one derived from a distinct hyper-parameters specification. The present paper introduces a two step-monitoring procedure to help users effectively explore such a vast model space. The first phase uncovers the most appropriate percentages of trimming, whilst the second phase explores the whole set of solutions, conditioning on the outcome derived from the previous step. The final output singles out a set of “top” solutions, whose optimality, stability and validity is assessed. Novel graphical and computational tools - specifically tailored for the CWRM framework - will help the user make an educated choice among the optimal solutions. Three examples on real datasets showcase our proposal in action. Supplementary files for this article are available online

    A robust approach to model-based classification based on trimming and constraints

    Full text link
    In a standard classification framework a set of trustworthy learning data are employed to build a decision rule, with the final aim of classifying unlabelled units belonging to the test set. Therefore, unreliable labelled observations, namely outliers and data with incorrect labels, can strongly undermine the classifier performance, especially if the training size is small. The present work introduces a robust modification to the Model-Based Classification framework, employing impartial trimming and constraints on the ratio between the maximum and the minimum eigenvalue of the group scatter matrices. The proposed method effectively handles noise presence in both response and exploratory variables, providing reliable classification even when dealing with contaminated datasets. A robust information criterion is proposed for model selection. Experiments on real and simulated data, artificially adulterated, are provided to underline the benefits of the proposed method

    Monitoring Tools in Robust CWM for the Analysis of Crime Data

    Get PDF
    Robust inference for the Cluster Weighted Model requires the specification of a few hyper-parameters. Their role is crucial for increasing the quality of the estimators, while arbitrary decisions about their value could severely hamper inferential results. To guide the user in the delicate choice of such parameters, a monitoring approach has been introduced in the recent literature, yielding an adaptive method. The approach is here exemplified, via the analysis of a dataset on the effect of punishment regimes on crime rates

    Eigenvalues and constraints in mixture modeling: geometric and computational issues

    Get PDF
    This paper presents a review about the usage of eigenvalues restrictions for constrained parameter estimation in mixtures of elliptical distributions according to the likelihood approach. These restrictions serve a twofold purpose: to avoid convergence to degenerate solutions and to reduce the onset of non interesting (spurious) maximizers, related to complex likelihood surfaces. The paper shows how the constraints may play a key role in the theory of Euclidean data clustering. The aim here is to provide a reasoned review of the constraints and their applications, along the contributions of many authors, spanning the literature of the last thirty years.Spanish Ministerio de Economía y Competitividad (grant MTM2017-86061-C2-1-P)Junta de Castilla y León - Fondo Europeo de Desarrollo Regional (grant VA005P17 and VA002G18

    Robust Approaches for Fuzzy Clusterwise Regression Based on Trimming and Constraints

    Get PDF
    Three different approaches for robust fuzzy clusterwise regression are reviewed. They are all based on the simultaneous application of trimming and constraints. The first one follows from the joint modeling of the response and explanatory variables through a normal component fitted in each cluster. The second one assumes normally distributed error terms conditional on the explanatory variables while the third approach is an extension of the Cluster Weighted Model. A fixed proportion of “most outlying” observations are trimmed. The use of appropriate constraints turns these problem into mathematically well-defined ones and, additionally, serves to avoid the detection of non-interesting or “spurious” linear clusters. The third proposal is specially appealing because it is able to protect us against outliers in the explanatory variables which may act as “bad leverage” points. Feasible and practical algorithms are outlined. Their performances, in terms of robustness, are illustrated in some simple simulated examples.Spanish Ministerio de Economía y Competitividad, grant MTM2017-86061-C2-1-P, and by Consejería de Educación de la Junta de Castilla y León and FEDER, grant VA005P17 and VA002G18