150,028 research outputs found
From here to infinity - sparse finite versus Dirichlet process mixtures in model-based clustering
In model-based-clustering mixture models are used to group data points into
clusters. A useful concept introduced for Gaussian mixtures by Malsiner Walli
et al (2016) are sparse finite mixtures, where the prior distribution on the
weight distribution of a mixture with components is chosen in such a way
that a priori the number of clusters in the data is random and is allowed to be
smaller than with high probability. The number of cluster is then inferred
a posteriori from the data.
The present paper makes the following contributions in the context of sparse
finite mixture modelling. First, it is illustrated that the concept of sparse
finite mixture is very generic and easily extended to cluster various types of
non-Gaussian data, in particular discrete data and continuous multivariate data
arising from non-Gaussian clusters. Second, sparse finite mixtures are compared
to Dirichlet process mixtures with respect to their ability to identify the
number of clusters. For both model classes, a random hyper prior is considered
for the parameters determining the weight distribution. By suitable matching of
these priors, it is shown that the choice of this hyper prior is far more
influential on the cluster solution than whether a sparse finite mixture or a
Dirichlet process mixture is taken into consideration.Comment: Accepted versio
Model Selection for Gaussian Mixture Models
This paper is concerned with an important issue in finite mixture modelling,
the selection of the number of mixing components. We propose a new penalized
likelihood method for model selection of finite multivariate Gaussian mixture
models. The proposed method is shown to be statistically consistent in
determining of the number of components. A modified EM algorithm is developed
to simultaneously select the number of components and to estimate the mixing
weights, i.e. the mixing probabilities, and unknown parameters of Gaussian
distributions. Simulations and a real data analysis are presented to illustrate
the performance of the proposed method
A Tight Convex Upper Bound on the Likelihood of a Finite Mixture
The likelihood function of a finite mixture model is a non-convex function
with multiple local maxima and commonly used iterative algorithms such as EM
will converge to different solutions depending on initial conditions. In this
paper we ask: is it possible to assess how far we are from the global maximum
of the likelihood? Since the likelihood of a finite mixture model can grow
unboundedly by centering a Gaussian on a single datapoint and shrinking the
covariance, we constrain the problem by assuming that the parameters of the
individual models are members of a large discrete set (e.g. estimating a
mixture of two Gaussians where the means and variances of both Gaussians are
members of a set of a million possible means and variances). For this setting
we show that a simple upper bound on the likelihood can be computed using
convex optimization and we analyze conditions under which the bound is
guaranteed to be tight. This bound can then be used to assess the quality of
solutions found by EM (where the final result is projected on the discrete set)
or any other mixture estimation algorithm. For any dataset our method allows us
to find a finite mixture model together with a dataset-specific bound on how
far the likelihood of this mixture is from the global optimum of the likelihoodComment: icpr 201
Introduction to finite mixtures
Mixture models have been around for over 150 years, as an intuitively simple
and practical tool for enriching the collection of probability distributions
available for modelling data. In this chapter we describe the basic ideas of
the subject, present several alternative representations and perspectives on
these models, and discuss some of the elements of inference about the unknowns
in the models. Our focus is on the simplest set-up, of finite mixture models,
but we discuss also how various simplifying assumptions can be relaxed to
generate the rich landscape of modelling and inference ideas traversed in the
rest of this book.Comment: 14 pages, 7 figures, A chapter prepared for the forthcoming Handbook
of Mixture Analysis. V2 corrects a small but important typographical error,
and makes other minor edits; V3 makes further minor corrections and updates
following review; V4 corrects algorithmic details in sec 4.1 and 4.2, and
removes typo
- …