2,827 research outputs found
A data driven equivariant approach to constrained Gaussian mixture modeling
Maximum likelihood estimation of Gaussian mixture models with different
class-specific covariance matrices is known to be problematic. This is due to
the unboundedness of the likelihood, together with the presence of spurious
maximizers. Existing methods to bypass this obstacle are based on the fact that
unboundedness is avoided if the eigenvalues of the covariance matrices are
bounded away from zero. This can be done imposing some constraints on the
covariance matrices, i.e. by incorporating a priori information on the
covariance structure of the mixture components. The present work introduces a
constrained equivariant approach, where the class conditional covariance
matrices are shrunk towards a pre-specified matrix Psi. Data-driven choices of
the matrix Psi, when a priori information is not available, and the optimal
amount of shrinkage are investigated. The effectiveness of the proposal is
evaluated on the basis of a simulation study and an empirical example
A robust approach to model-based classification based on trimming and constraints
In a standard classification framework a set of trustworthy learning data are
employed to build a decision rule, with the final aim of classifying unlabelled
units belonging to the test set. Therefore, unreliable labelled observations,
namely outliers and data with incorrect labels, can strongly undermine the
classifier performance, especially if the training size is small. The present
work introduces a robust modification to the Model-Based Classification
framework, employing impartial trimming and constraints on the ratio between
the maximum and the minimum eigenvalue of the group scatter matrices. The
proposed method effectively handles noise presence in both response and
exploratory variables, providing reliable classification even when dealing with
contaminated datasets. A robust information criterion is proposed for model
selection. Experiments on real and simulated data, artificially adulterated,
are provided to underline the benefits of the proposed method
A general trimming approach to robust Cluster Analysis
We introduce a new method for performing clustering with the aim of fitting
clusters with different scatters and weights. It is designed by allowing to
handle a proportion of contaminating data to guarantee the robustness
of the method. As a characteristic feature, restrictions on the ratio between
the maximum and the minimum eigenvalues of the groups scatter matrices are
introduced. This makes the problem to be well defined and guarantees the
consistency of the sample solutions to the population ones. The method covers a
wide range of clustering approaches depending on the strength of the chosen
restrictions. Our proposal includes an algorithm for approximately solving the
sample problem.Comment: Published in at http://dx.doi.org/10.1214/07-AOS515 the Annals of
Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical
Statistics (http://www.imstat.org
Robust estimation of mixtures of regressions with random covariates, via trimming and constraints
Producción CientíficaA robust estimator for a wide family of mixtures of linear regression is presented.
Robustness is based on the joint adoption of the Cluster Weighted Model and
of an estimator based on trimming and restrictions. The selected model provides the
conditional distribution of the response for each group, as in mixtures of regression,
and further supplies local distributions for the explanatory variables. A novel version
of the restrictions has been devised, under this model, for separately controlling the
two sources of variability identified in it. This proposal avoids singularities in the
log-likelihood, caused by approximate local collinearity in the explanatory variables
or local exact fits in regressions, and reduces the occurrence of spurious local maximizers.
In a natural way, due to the interaction between the model and the estimator,
the procedure is able to resist the harmful influence of bad leverage points along the
estimation of the mixture of regressions, which is still an open issue in the literature.
The given methodology defines a well-posed statistical problem, whose estimator exists
and is consistent to the corresponding solution of the population optimum, under
widely general conditions. A feasible EM algorithm has also been provided to obtain
the corresponding estimation. Many simulated examples and two real datasets have
been chosen to show the ability of the procedure, on the one hand, to detect anomalous
data, and, on the other hand, to identify the real cluster regressions without the
influence of contamination.
Keywords Cluster Weighted Modeling · Mixture of Regressions · Robustnes
The joint role of trimming and constraints in robust estimation for mixtures of Gaussian factor analyzers.
Producción CientíficaMixtures of Gaussian factors are powerful tools for modeling an unobserved heterogeneous
population, offering – at the same time – dimension reduction and model-based clustering. The high prevalence of spurious solutions and the disturbing effects of outlying observations in maximum likelihood estimation may cause biased or misleading inferences. Restrictions for the component covariances are considered in order to avoid spurious solutions, and trimming is also adopted, to provide robustness against violations of normality assumptions of the underlying latent factors. A detailed AECM algorithm for this new approach is presented. Simulation results and an application to the AIS dataset show the aim and effectiveness of the proposed methodology.Ministerio de Economía y Competitividad and FEDER, grant MTM2014-56235-C2-1-P, and by Consejería de Educación de la Junta de Castilla y León, grant VA212U13, by grant FAR 2015 from the University of Milano-Bicocca and by grant FIR 2014 from the University of Catania
The joint role of trimming and constraints in robust estimation for mixtures of Gaussian factor analyzers.
Producción CientíficaMixtures of Gaussian factors are powerful tools for modeling an unobserved heterogeneous
population, offering – at the same time – dimension reduction and model-based clustering. The high prevalence of spurious solutions and the disturbing effects of outlying observations in maximum likelihood estimation may cause biased or misleading inferences. Restrictions for the component covariances are considered in order to avoid spurious solutions, and trimming is also adopted, to provide robustness against violations of normality assumptions of the underlying latent factors. A detailed AECM algorithm for this new approach is presented. Simulation results and an application to the AIS dataset show the aim and effectiveness of the proposed methodology.Ministerio de Economía y Competitividad and FEDER, grant MTM2014-56235-C2-1-P, and by Consejería de Educación de la Junta de Castilla y León, grant VA212U13, by grant FAR 2015 from the University of Milano-Bicocca and by grant FIR 2014 from the University of Catania
Finding the Number of Groups in Model-Based Clustering via Constrained Likelihoods
Deciding the number of clusters k is one of the most difficult problems in Cluster
Analysis. For this purpose, complexity-penalized likelihood approaches have been
introduced in model-based clustering, such as the well known BIC and ICL criteria.
However, the classification/mixture likelihoods considered in these approaches
are unbounded without any constraint on the cluster scatter matrices. Constraints
also prevent traditional EM and CEM algorithms from being trapped in (spurious)
local maxima. Controlling the maximal ratio between the eigenvalues of the scatter
matrices to be smaller than a fixed constant c ≥ 1 is a sensible idea for setting such
constraints. A new penalized likelihood criterion which takes into account the higher
model complexity that a higher value of c entails, is proposed. Based on this criterion,
a novel and fully automatized procedure, leading to a small ranked list of optimal
(k; c) couples is provided. Its performance is assessed both in empirical examples and
through a simulation study as a function of cluster overlap
- …