6,332 research outputs found
EM Algorithms for Weighted-Data Clustering with Application to Audio-Visual Scene Analysis
Data clustering has received a lot of attention and numerous methods,
algorithms and software packages are available. Among these techniques,
parametric finite-mixture models play a central role due to their interesting
mathematical properties and to the existence of maximum-likelihood estimators
based on expectation-maximization (EM). In this paper we propose a new mixture
model that associates a weight with each observed point. We introduce the
weighted-data Gaussian mixture and we derive two EM algorithms. The first one
considers a fixed weight for each observation. The second one treats each
weight as a random variable following a gamma distribution. We propose a model
selection method based on a minimum message length criterion, provide a weight
initialization strategy, and validate the proposed algorithms by comparing them
with several state of the art parametric and non-parametric clustering
techniques. We also demonstrate the effectiveness and robustness of the
proposed clustering technique in the presence of heterogeneous data, namely
audio-visual scene analysis.Comment: 14 pages, 4 figures, 4 table
Robust EM algorithm for model-based curve clustering
Model-based clustering approaches concern the paradigm of exploratory data
analysis relying on the finite mixture model to automatically find a latent
structure governing observed data. They are one of the most popular and
successful approaches in cluster analysis. The mixture density estimation is
generally performed by maximizing the observed-data log-likelihood by using the
expectation-maximization (EM) algorithm. However, it is well-known that the EM
algorithm initialization is crucial. In addition, the standard EM algorithm
requires the number of clusters to be known a priori. Some solutions have been
provided in [31, 12] for model-based clustering with Gaussian mixture models
for multivariate data. In this paper we focus on model-based curve clustering
approaches, when the data are curves rather than vectorial data, based on
regression mixtures. We propose a new robust EM algorithm for clustering
curves. We extend the model-based clustering approach presented in [31] for
Gaussian mixture models, to the case of curve clustering by regression
mixtures, including polynomial regression mixtures as well as spline or
B-spline regressions mixtures. Our approach both handles the problem of
initialization and the one of choosing the optimal number of clusters as the EM
learning proceeds, rather than in a two-fold scheme. This is achieved by
optimizing a penalized log-likelihood criterion. A simulation study confirms
the potential benefit of the proposed algorithm in terms of robustness
regarding initialization and funding the actual number of clusters.Comment: In Proceedings of the 2013 International Joint Conference on Neural
Networks (IJCNN), 2013, Dallas, TX, US
Mixtures of Shifted Asymmetric Laplace Distributions
A mixture of shifted asymmetric Laplace distributions is introduced and used
for clustering and classification. A variant of the EM algorithm is developed
for parameter estimation by exploiting the relationship with the general
inverse Gaussian distribution. This approach is mathematically elegant and
relatively computationally straightforward. Our novel mixture modelling
approach is demonstrated on both simulated and real data to illustrate
clustering and classification applications. In these analyses, our mixture of
shifted asymmetric Laplace distributions performs favourably when compared to
the popular Gaussian approach. This work, which marks an important step in the
non-Gaussian model-based clustering and classification direction, concludes
with discussion as well as suggestions for future work
A data driven equivariant approach to constrained Gaussian mixture modeling
Maximum likelihood estimation of Gaussian mixture models with different
class-specific covariance matrices is known to be problematic. This is due to
the unboundedness of the likelihood, together with the presence of spurious
maximizers. Existing methods to bypass this obstacle are based on the fact that
unboundedness is avoided if the eigenvalues of the covariance matrices are
bounded away from zero. This can be done imposing some constraints on the
covariance matrices, i.e. by incorporating a priori information on the
covariance structure of the mixture components. The present work introduces a
constrained equivariant approach, where the class conditional covariance
matrices are shrunk towards a pre-specified matrix Psi. Data-driven choices of
the matrix Psi, when a priori information is not available, and the optimal
amount of shrinkage are investigated. The effectiveness of the proposal is
evaluated on the basis of a simulation study and an empirical example
Consistency, breakdown robustness, and algorithms for robust improper maximum likelihood clustering
The robust improper maximum likelihood estimator (RIMLE) is a new method for
robust multivariate clustering finding approximately Gaussian clusters. It
maximizes a pseudo-likelihood defined by adding a component with improper
constant density for accommodating outliers to a Gaussian mixture. A special
case of the RIMLE is MLE for multivariate finite Gaussian mixture models. In
this paper we treat existence, consistency, and breakdown theory for the RIMLE
comprehensively. RIMLE's existence is proved under non-smooth covariance matrix
constraints. It is shown that these can be implemented via a computationally
feasible Expectation-Conditional Maximization algorithm.Comment: The title of this paper was originally: "A consistent and breakdown
robust model-based clustering method
- …