782 research outputs found
Robust improper maximum likelihood: tuning, computation, and a comparison with other methods for robust Gaussian clustering
The two main topics of this paper are the introduction of the "optimally
tuned improper maximum likelihood estimator" (OTRIMLE) for robust clustering
based on the multivariate Gaussian model for clusters, and a comprehensive
simulation study comparing the OTRIMLE to Maximum Likelihood in Gaussian
mixtures with and without noise component, mixtures of t-distributions, and the
TCLUST approach for trimmed clustering. The OTRIMLE uses an improper constant
density for modelling outliers and noise. This can be chosen optimally so that
the non-noise part of the data looks as close to a Gaussian mixture as
possible. Some deviation from Gaussianity can be traded in for lowering the
estimated noise proportion. Covariance matrix constraints and computation of
the OTRIMLE are also treated. In the simulation study, all methods are
confronted with setups in which their model assumptions are not exactly
fulfilled, and in order to evaluate the experiments in a standardized way by
misclassification rates, a new model-based definition of "true clusters" is
introduced that deviates from the usual identification of mixture components
with clusters. In the study, every method turns out to be superior for one or
more setups, but the OTRIMLE achieves the most satisfactory overall
performance. The methods are also applied to two real datasets, one without and
one with known "true" clusters
S-estimation of hidden Markov models
A method for robust estimation of dynamic mixtures of multivariate distributions is proposed. The EM algorithm is modified by replacing the classical M-step
with high breakdown S-estimation of location and scatter, performed by using the
bisquare multivariate S-estimator. Estimates are obtained by solving a system of estimating equations that are characterized by component specific sets of weights, based on
robust Mahalanobis-type distances. Convergence of the resulting algorithm is proved
and its finite sample behavior is investigated by means of a brief simulation study and
n application to a multivariate time series of daily returns for seven stock markets
Robustness and Outliers
Producción CientíficaUnexpected deviations from assumed models as well as the presence of certain amounts of outlying data are common in most practical statistical applications. This fact could lead to undesirable solutions when applying non-robust statistical techniques. This is often the case in cluster analysis, too. The search for homogeneous groups with large heterogeneity between them can be spoiled due to the lack of robustness of standard clustering methods. For instance, the presence of (even few) outlying observations may result in heterogeneous clusters artificially joined together or in the detection of spurious clusters merely made up of outlying observations. In this chapter we will analyze the effects of different kinds of outlying data in cluster analysis and explore several alternative methodologies designed to avoid or minimize their undesirable effects.Ministerio de Economía, Industria y Competitividad (MTM2014-56235-C2-1-P)Junta de Castilla y León (programa de apoyo a proyectos de investigación – Ref. VA212U13
Robust estimation for mixtures of Gaussian factor analyzers, based on trimming and constraints
Producción CientíficaMixtures of Gaussian factors are powerful tools for modeling an unobserved
heterogeneous population, offering - at the same time - dimension reduction
and model-based clustering. Unfortunately, the high prevalence of spurious
solutions and the disturbing effects of outlying observations, along maximum likelihood
estimation, open serious issues. In this paper we consider restrictions for
the component covariances, to avoid spurious solutions, and trimming, to provide
robustness against violations of normality assumptions of the underlying latent factors.
A detailed AECM algorithm for this new approach is presented. Simulation
results and an application to the AIS dataset show the aim and effectiveness of the
proposed methodology
Robust estimation of mixtures of regressions with random covariates, via trimming and constraints
Producción CientíficaA robust estimator for a wide family of mixtures of linear regression is presented.
Robustness is based on the joint adoption of the Cluster Weighted Model and
of an estimator based on trimming and restrictions. The selected model provides the
conditional distribution of the response for each group, as in mixtures of regression,
and further supplies local distributions for the explanatory variables. A novel version
of the restrictions has been devised, under this model, for separately controlling the
two sources of variability identified in it. This proposal avoids singularities in the
log-likelihood, caused by approximate local collinearity in the explanatory variables
or local exact fits in regressions, and reduces the occurrence of spurious local maximizers.
In a natural way, due to the interaction between the model and the estimator,
the procedure is able to resist the harmful influence of bad leverage points along the
estimation of the mixture of regressions, which is still an open issue in the literature.
The given methodology defines a well-posed statistical problem, whose estimator exists
and is consistent to the corresponding solution of the population optimum, under
widely general conditions. A feasible EM algorithm has also been provided to obtain
the corresponding estimation. Many simulated examples and two real datasets have
been chosen to show the ability of the procedure, on the one hand, to detect anomalous
data, and, on the other hand, to identify the real cluster regressions without the
influence of contamination.
Keywords Cluster Weighted Modeling · Mixture of Regressions · Robustnes
The joint role of trimming and constraints in robust estimation for mixtures of Gaussian factor analyzers.
Producción CientíficaMixtures of Gaussian factors are powerful tools for modeling an unobserved heterogeneous
population, offering – at the same time – dimension reduction and model-based clustering. The high prevalence of spurious solutions and the disturbing effects of outlying observations in maximum likelihood estimation may cause biased or misleading inferences. Restrictions for the component covariances are considered in order to avoid spurious solutions, and trimming is also adopted, to provide robustness against violations of normality assumptions of the underlying latent factors. A detailed AECM algorithm for this new approach is presented. Simulation results and an application to the AIS dataset show the aim and effectiveness of the proposed methodology.Ministerio de Economía y Competitividad and FEDER, grant MTM2014-56235-C2-1-P, and by Consejería de Educación de la Junta de Castilla y León, grant VA212U13, by grant FAR 2015 from the University of Milano-Bicocca and by grant FIR 2014 from the University of Catania
- …