Search CORE

2,827 research outputs found

A data driven equivariant approach to constrained Gaussian mixture modeling

Author: Di Mari Roberto
Gattone Stefano Antonio
Rocci Roberto
Publication venue
Publication date: 25/10/2016
Field of study

Maximum likelihood estimation of Gaussian mixture models with different class-specific covariance matrices is known to be problematic. This is due to the unboundedness of the likelihood, together with the presence of spurious maximizers. Existing methods to bypass this obstacle are based on the fact that unboundedness is avoided if the eigenvalues of the covariance matrices are bounded away from zero. This can be done imposing some constraints on the covariance matrices, i.e. by incorporating a priori information on the covariance structure of the mixture components. The present work introduces a constrained equivariant approach, where the class conditional covariance matrices are shrunk towards a pre-specified matrix Psi. Data-driven choices of the matrix Psi, when a priori information is not available, and the optimal amount of shrinkage are investigated. The effectiveness of the proposal is evaluated on the basis of a simulation study and an empirical example

arXiv.org e-Print Archive

ART

Archivio della ricerca- Università di Roma La Sapienza

A robust approach to model-based classification based on trimming and constraints

Author: Cappozzo Andrea
Greselin Francesca
Murphy Thomas Brendan
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 05/08/2019
Field of study

In a standard classification framework a set of trustworthy learning data are employed to build a decision rule, with the final aim of classifying unlabelled units belonging to the test set. Therefore, unreliable labelled observations, namely outliers and data with incorrect labels, can strongly undermine the classifier performance, especially if the training size is small. The present work introduces a robust modification to the Model-Based Classification framework, employing impartial trimming and constraints on the ratio between the maximum and the minimum eigenvalue of the group scatter matrices. The proposed method effectively handles noise presence in both response and exploratory variables, providing reliable classification even when dealing with contaminated datasets. A robust information criterion is proposed for model selection. Experiments on real and simulated data, artificially adulterated, are provided to underline the benefits of the proposed method

arXiv.org e-Print Archive

Archivio istituzionale della ricerca - Politecnico di Milano

Research Repository UCD

Irish Universities

A general trimming approach to robust Cluster Analysis

Author: García-Escudero Luis A.
Gordaliza Alfonso
Matrán Carlos
Mayo-Iscar Agustin
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 18/06/2008
Field of study

We introduce a new method for performing clustering with the aim of fitting clusters with different scatters and weights. It is designed by allowing to handle a proportion

\alpha

of contaminating data to guarantee the robustness of the method. As a characteristic feature, restrictions on the ratio between the maximum and the minimum eigenvalues of the groups scatter matrices are introduced. This makes the problem to be well defined and guarantees the consistency of the sample solutions to the population ones. The method covers a wide range of clustering approaches depending on the strength of the chosen restrictions. Our proposal includes an algorithm for approximately solving the sample problem.Comment: Published in at http://dx.doi.org/10.1214/07-AOS515 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

Crossref

Robust estimation of mixtures of regressions with random covariates, via trimming and constraints

Author: García Escudero Luis Ángel
Gordaliza Ramos Alfonso
Greselin Francesca
Ingrassia Salvatore
Mayo Iscar Agustín
Publication venue: Universidad de Valladolid. Facultad de Medicina
Publication date: 01/01/2015
Field of study

Producción CientíficaA robust estimator for a wide family of mixtures of linear regression is presented. Robustness is based on the joint adoption of the Cluster Weighted Model and of an estimator based on trimming and restrictions. The selected model provides the conditional distribution of the response for each group, as in mixtures of regression, and further supplies local distributions for the explanatory variables. A novel version of the restrictions has been devised, under this model, for separately controlling the two sources of variability identified in it. This proposal avoids singularities in the log-likelihood, caused by approximate local collinearity in the explanatory variables or local exact fits in regressions, and reduces the occurrence of spurious local maximizers. In a natural way, due to the interaction between the model and the estimator, the procedure is able to resist the harmful influence of bad leverage points along the estimation of the mixture of regressions, which is still an open issue in the literature. The given methodology defines a well-posed statistical problem, whose estimator exists and is consistent to the corresponding solution of the population optimum, under widely general conditions. A feasible EM algorithm has also been provided to obtain the corresponding estimation. Many simulated examples and two real datasets have been chosen to show the ability of the procedure, on the one hand, to detect anomalous data, and, on the other hand, to identify the real cluster regressions without the influence of contamination. Keywords Cluster Weighted Modeling · Mixture of Regressions · Robustnes

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Repositorio Documental de la Universidad de Valladolid

The joint role of trimming and constraints in robust estimation for mixtures of Gaussian factor analyzers.

Author: García-Escudero Luis Ángel
Gordaliza Alfonso
Greselin Francesca
Ingrassia Salvatore
Mayo Iscar Agustín
Publication venue: ELSEVIER
Publication date: 15/12/1998
Field of study

Producción CientíficaMixtures of Gaussian factors are powerful tools for modeling an unobserved heterogeneous population, offering – at the same time – dimension reduction and model-based clustering. The high prevalence of spurious solutions and the disturbing effects of outlying observations in maximum likelihood estimation may cause biased or misleading inferences. Restrictions for the component covariances are considered in order to avoid spurious solutions, and trimming is also adopted, to provide robustness against violations of normality assumptions of the underlying latent factors. A detailed AECM algorithm for this new approach is presented. Simulation results and an application to the AIS dataset show the aim and effectiveness of the proposed methodology.Ministerio de Economía y Competitividad and FEDER, grant MTM2014-56235-C2-1-P, and by Consejería de Educación de la Junta de Castilla y León, grant VA212U13, by grant FAR 2015 from the University of Milano-Bicocca and by grant FIR 2014 from the University of Catania

Crossref

Repositorio Documental de la Universidad de Valladolid

Oskar Bordeaux

The joint role of trimming and constraints in robust estimation for mixtures of Gaussian factor analyzers.

Author: García Escudero Luis Ángel
Gordaliza Ramos Alfonso
Greselin Francesca
Ingrassia Salvatore
Mayo Iscar Agustín
Publication venue: 'Elsevier BV'
Publication date: 01/01/2016
Field of study

Repositorio Documental de la Universidad de Valladolid

Finding the Number of Groups in Model-Based Clustering via Constrained Likelihoods

Author: Cerioli Andrea
García Escudero Luis Ángel
Mayo Iscar Agustín
Riani Marco
Publication venue
Publication date: 01/01/2016
Field of study

Deciding the number of clusters k is one of the most difficult problems in Cluster Analysis. For this purpose, complexity-penalized likelihood approaches have been introduced in model-based clustering, such as the well known BIC and ICL criteria. However, the classification/mixture likelihoods considered in these approaches are unbounded without any constraint on the cluster scatter matrices. Constraints also prevent traditional EM and CEM algorithms from being trapped in (spurious) local maxima. Controlling the maximal ratio between the eigenvalues of the scatter matrices to be smaller than a fixed constant c ≥ 1 is a sensible idea for setting such constraints. A new penalized likelihood criterion which takes into account the higher model complexity that a higher value of c entails, is proposed. Based on this criterion, a novel and fully automatized procedure, leading to a small ranked list of optimal (k; c) couples is provided. Its performance is assessed both in empirical examples and through a simulation study as a function of cluster overlap

Repositorio Documental de la Universidad de Valladolid