453 research outputs found
Particle Gibbs Split-Merge Sampling for Bayesian Inference in Mixture Models
This paper presents a new Markov chain Monte Carlo method to sample from the
posterior distribution of conjugate mixture models. This algorithm relies on a
flexible split-merge procedure built using the particle Gibbs sampler. Contrary
to available split-merge procedures, the resulting so-called Particle Gibbs
Split-Merge sampler does not require the computation of a complex acceptance
ratio, is simple to implement using existing sequential Monte Carlo libraries
and can be parallelized. We investigate its performance experimentally on
synthetic problems as well as on geolocation and cancer genomics data. In all
these examples, the particle Gibbs split-merge sampler outperforms
state-of-the-art split-merge methods by up to an order of magnitude for a fixed
computational complexity
Distance Dependent Chinese Restaurant Processes
We develop the distance dependent Chinese restaurant process (CRP), a
flexible class of distributions over partitions that allows for
non-exchangeability. This class can be used to model many kinds of dependencies
between data in infinite clustering models, including dependencies across time
or space. We examine the properties of the distance dependent CRP, discuss its
connections to Bayesian nonparametric mixture models, and derive a Gibbs
sampler for both observed and mixture settings. We study its performance with
three text corpora. We show that relaxing the assumption of exchangeability
with distance dependent CRPs can provide a better fit to sequential data. We
also show its alternative formulation of the traditional CRP leads to a
faster-mixing Gibbs sampling algorithm than the one based on the original
formulation
Sparse covariance estimation in heterogeneous samples
Standard Gaussian graphical models (GGMs) implicitly assume that the
conditional independence among variables is common to all observations in the
sample. However, in practice, observations are usually collected form
heterogeneous populations where such assumption is not satisfied, leading in
turn to nonlinear relationships among variables. To tackle these problems we
explore mixtures of GGMs; in particular, we consider both infinite mixture
models of GGMs and infinite hidden Markov models with GGM emission
distributions. Such models allow us to divide a heterogeneous population into
homogenous groups, with each cluster having its own conditional independence
structure. The main advantage of considering infinite mixtures is that they
allow us easily to estimate the number of number of subpopulations in the
sample. As an illustration, we study the trends in exchange rate fluctuations
in the pre-Euro era. This example demonstrates that the models are very
flexible while providing extremely interesting interesting insights into
real-life applications
- …