13,827 research outputs found
Bayesian solutions to the label switching problem
The label switching problem, the unidentifiability of the permutation of clusters or more generally latent variables, makes interpretation of results computed with MCMC sampling difficult. We introduce a fully Bayesian treatment of the permutations which performs better than alternatives. The method can be used to compute summaries of the posterior samples even for nonparametric Bayesian methods, for which no good solutions exist so far. Although being approximative in this case, the results are very promising. The summaries are intuitively appealing: A summarized cluster is defined as a set of points for which the likelihood of being in the same cluster is maximized
Dealing with Label Switching in Mixture Models Under Genuine Multimodality
The fitting of finite mixture models is an ill-defined estimation problem as completely different parameterizations can induce similar mixture distributions. This leads to multiple modes in the likelihood which is a problem for frequentist maximum likelihood estimation, and complicates statistical inference of Markov chain Monte Carlo draws in Bayesian estimation. For the analysis of the posterior density of these draws a suitable separation into different modes is desirable. In addition, a unique labelling of the component specific estimates is necessary to solve the label
switching problem. This paper presents and compares two approaches to achieve these goals: relabelling under multimodality and constrained clustering. The algorithmic details are discussed and their application is demonstrated on artificial and real-world data
Importance sampling schemes for evidence approximation in mixture models
The marginal likelihood is a central tool for drawing Bayesian inference
about the number of components in mixture models. It is often approximated
since the exact form is unavailable. A bias in the approximation may be due to
an incomplete exploration by a simulated Markov chain (e.g., a Gibbs sequence)
of the collection of posterior modes, a phenomenon also known as lack of label
switching, as all possible label permutations must be simulated by a chain in
order to converge and hence overcome the bias. In an importance sampling
approach, imposing label switching to the importance function results in an
exponential increase of the computational cost with the number of components.
In this paper, two importance sampling schemes are proposed through choices for
the importance function; a MLE proposal and a Rao-Blackwellised importance
function. The second scheme is called dual importance sampling. We demonstrate
that this dual importance sampling is a valid estimator of the evidence and
moreover show that the statistical efficiency of estimates increases. To reduce
the induced high demand in computation, the original importance function is
approximated but a suitable approximation can produce an estimate with the same
precision and with reduced computational workload.Comment: 24 pages, 5 figure
Relabelling Algorithms for Large Dataset Mixture Models
Mixture models are flexible tools in density estimation and classification
problems. Bayesian estimation of such models typically relies on sampling from
the posterior distribution using Markov chain Monte Carlo. Label switching
arises because the posterior is invariant to permutations of the component
parameters. Methods for dealing with label switching have been studied fairly
extensively in the literature, with the most popular approaches being those
based on loss functions. However, many of these algorithms turn out to be too
slow in practice, and can be infeasible as the size and dimension of the data
grow. In this article, we review earlier solutions which can scale up well for
large data sets, and compare their performances on simulated and real datasets.
In addition, we propose a new, and computationally efficient algorithm based on
a loss function interpretation, and show that it can scale up well in larger
problems. We conclude with some discussions and recommendations of all the
methods studied
A sticky HDP-HMM with application to speaker diarization
We consider the problem of speaker diarization, the problem of segmenting an
audio recording of a meeting into temporal segments corresponding to individual
speakers. The problem is rendered particularly difficult by the fact that we
are not allowed to assume knowledge of the number of people participating in
the meeting. To address this problem, we take a Bayesian nonparametric approach
to speaker diarization that builds on the hierarchical Dirichlet process hidden
Markov model (HDP-HMM) of Teh et al. [J. Amer. Statist. Assoc. 101 (2006)
1566--1581]. Although the basic HDP-HMM tends to over-segment the audio
data---creating redundant states and rapidly switching among them---we describe
an augmented HDP-HMM that provides effective control over the switching rate.
We also show that this augmentation makes it possible to treat emission
distributions nonparametrically. To scale the resulting architecture to
realistic diarization problems, we develop a sampling algorithm that employs a
truncated approximation of the Dirichlet process to jointly resample the full
state sequence, greatly improving mixing rates. Working with a benchmark NIST
data set, we show that our Bayesian nonparametric architecture yields
state-of-the-art speaker diarization results.Comment: Published in at http://dx.doi.org/10.1214/10-AOAS395 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Overfitting Bayesian Mixture Models with an Unknown Number of Components
This paper proposes solutions to three issues pertaining to the estimation of
finite mixture models with an unknown number of components: the
non-identifiability induced by overfitting the number of components, the mixing
limitations of standard Markov Chain Monte Carlo (MCMC) sampling techniques,
and the related label switching problem. An overfitting approach is used to
estimate the number of components in a finite mixture model via a Zmix
algorithm. Zmix provides a bridge between multidimensional samplers and test
based estimation methods, whereby priors are chosen to encourage extra groups
to have weights approaching zero. MCMC sampling is made possible by the
implementation of prior parallel tempering, an extension of parallel tempering.
Zmix can accurately estimate the number of components, posterior parameter
estimates and allocation probabilities given a sufficiently large sample size.
The results will reflect uncertainty in the final model and will report the
range of possible candidate models and their respective estimated probabilities
from a single run. Label switching is resolved with a computationally
light-weight method, Zswitch, developed for overfitted mixtures by exploiting
the intuitiveness of allocation-based relabelling algorithms and the precision
of label-invariant loss functions. Four simulation studies are included to
illustrate Zmix and Zswitch, as well as three case studies from the literature.
All methods are available as part of the R package Zmix, which can currently be
applied to univariate Gaussian mixture model
- …