9 research outputs found
The TCLUST Approach to Robust Cluster Analysis
Producción CientíficaA new method for performing robust clustering is proposed. The
method is designed with the aim of ¯tting clusters with di®erent scat-
ters and weights. A proportion ® of contaminating data points is also
allowed. Restrictions on the ratio between the maximum and the min-
imum eigenvalues of the groups scatter matrices are introduced. These
restrictions make the problem to be well-de¯ned guaranteeing the ex-
istence and the consistency of the sample estimators to the population
parameters.Estadística e I
Comments on “The power of monitoring: how to make the most of a contaminated multivariate sample”
These are comments on the invited paper “The power of monitoring: How
to make the most of a contaminated multivariate sample” by Andrea Cerioli, Marco
Riani, Anthony Atkinson and Aldo Corbellini.Spanish Ministerio de Economía y Competitividad, grant MTM2017-86061-C2-1-P, and by Consejería de Educación de la Junta de Castilla y León and FEDER, grant VA005P17 and VA002G18
Robustness and Outliers
Producción CientíficaUnexpected deviations from assumed models as well as the presence of certain amounts of outlying data are common in most practical statistical applications. This fact could lead to undesirable solutions when applying non-robust statistical techniques. This is often the case in cluster analysis, too. The search for homogeneous groups with large heterogeneity between them can be spoiled due to the lack of robustness of standard clustering methods. For instance, the presence of (even few) outlying observations may result in heterogeneous clusters artificially joined together or in the detection of spurious clusters merely made up of outlying observations. In this chapter we will analyze the effects of different kinds of outlying data in cluster analysis and explore several alternative methodologies designed to avoid or minimize their undesirable effects.Ministerio de Economía, Industria y Competitividad (MTM2014-56235-C2-1-P)Junta de Castilla y León (programa de apoyo a proyectos de investigación – Ref. VA212U13
Avoiding Spurious Local Maximizers in Mixture Modeling
Producción CientíficaThe maximum likelihood estimation in the finite mixture of distributions setting is
an ill-posed problem that is treatable, in practice, through the EM algorithm. However,
the existence of spurious solutions (singularities and non-interesting local maximizers)
makes difficult to find sensible mixture fits for non-expert practitioners. In this work, a
constrained mixture fitting approach is presented with the aim of overcoming the troubles
introduced by spurious solutions. Sound mathematical support is provided and,
which is more relevant in practice, a feasible algorithm is also given. This algorithm
allows for monitoring solutions in terms of the constant involved in the restrictions,
which yields a natural way to discard spurious solutions and a valuable tool for data
analysts.Estadística e I
Exploring the number of groups in robust model-based clustering
Producción CientíficaTwo key questions in Clustering problems are how to determine the number of
groups properly and measure the strength of group-assignments. These questions are
specially involved when the presence of certain fraction of outlying data is also expected.
Any answer to these two key questions should depend on the assumed probabilistic-
model, the allowed group scatters and what we understand by noise. With this in
mind, some exploratory \trimming-based" tools are presented in this work together
with their justi cations. The monitoring of optimal values reached when solving a
robust clustering criteria and the use of some "discriminant" factors are the basis for these exploratory tools.Estadística e I
Grouping Around Different Dimensional Affine Subspaces
Grouping around affine subspaces and other types of manifolds is
receiving a lot of attention in the literature due to its interest in several fields of
application. Allowing for different dimensions is needed in many applications. This
work extends the TCLUST methodology to deal with the problem of grouping data
around different dimensional linear subspaces in the presence of noise. Two ways
of considering error terms in the orthogonal of the linear subspaces are considered