266 research outputs found
Fast MCMC sampling for Markov jump processes and extensions
Markov jump processes (or continuous-time Markov chains) are a simple and
important class of continuous-time dynamical systems. In this paper, we tackle
the problem of simulating from the posterior distribution over paths in these
models, given partial and noisy observations. Our approach is an auxiliary
variable Gibbs sampler, and is based on the idea of uniformization. This sets
up a Markov chain over paths by alternately sampling a finite set of virtual
jump times given the current path and then sampling a new path given the set of
extant and virtual jump times using a standard hidden Markov model forward
filtering-backward sampling algorithm. Our method is exact and does not involve
approximations like time-discretization. We demonstrate how our sampler extends
naturally to MJP-based models like Markov-modulated Poisson processes and
continuous-time Bayesian networks and show significant computational benefits
over state-of-the-art MCMC samplers for these models.Comment: Accepted at the Journal of Machine Learning Research (JMLR
Bayesian nonparametric models for ranked data
We develop a Bayesian nonparametric extension of the popular Plackett-Luce
choice model that can handle an infinite number of choice items. Our framework
is based on the theory of random atomic measures, with the prior specified by a
gamma process. We derive a posterior characterization and a simple and
effective Gibbs sampler for posterior simulation. We develop a time-varying
extension of our model, and apply it to the New York Times lists of weekly
bestselling books.Comment: NIPS - Neural Information Processing Systems (2012
A nonparametric HMM for genetic imputation and coalescent inference
Genetic sequence data are well described by hidden Markov models (HMMs) in
which latent states correspond to clusters of similar mutation patterns. Theory
from statistical genetics suggests that these HMMs are nonhomogeneous (their
transition probabilities vary along the chromosome) and have large support for
self transitions. We develop a new nonparametric model of genetic sequence
data, based on the hierarchical Dirichlet process, which supports these self
transitions and nonhomogeneity. Our model provides a parameterization of the
genetic process that is more parsimonious than other more general nonparametric
models which have previously been applied to population genetics. We provide
truncation-free MCMC inference for our model using a new auxiliary sampling
scheme for Bayesian nonparametric HMMs. In a series of experiments on male X
chromosome data from the Thousand Genomes Project and also on data simulated
from a population bottleneck we show the benefits of our model over the popular
finite model fastPHASE, which can itself be seen as a parametric truncation of
our model. We find that the number of HMM states found by our model is
correlated with the time to the most recent common ancestor in population
bottlenecks. This work demonstrates the flexibility of Bayesian nonparametrics
applied to large and complex genetic data
A hybrid sampler for Poisson-Kingman mixture models
This paper concerns the introduction of a new Markov Chain Monte Carlo scheme
for posterior sampling in Bayesian nonparametric mixture models with priors
that belong to the general Poisson-Kingman class. We present a novel compact
way of representing the infinite dimensional component of the model such that
while explicitly representing this infinite component it has less memory and
storage requirements than previous MCMC schemes. We describe comparative
simulation results demonstrating the efficacy of the proposed MCMC algorithm
against existing marginal and conditional MCMC samplers
An Exact Auxiliary Variable Gibbs Sampler for a Class of Diffusions
Stochastic differential equations (SDEs) or diffusions are continuous-valued
continuous-time stochastic processes widely used in the applied and
mathematical sciences. Simulating paths from these processes is usually an
intractable problem, and typically involves time-discretization approximations.
We propose an exact Markov chain Monte Carlo sampling algorithm that involves
no such time-discretization error. Our sampler is applicable to the problem of
prior simulation from an SDE, posterior simulation conditioned on noisy
observations, as well as parameter inference given noisy observations. Our work
recasts an existing rejection sampling algorithm for a class of diffusions as a
latent variable model, and then derives an auxiliary variable Gibbs sampling
algorithm that targets the associated joint distribution. At a high level, the
resulting algorithm involves two steps: simulating a random grid of times from
an inhomogeneous Poisson process, and updating the SDE trajectory conditioned
on this grid. Our work allows the vast literature of Monte Carlo sampling
algorithms from the Gaussian process literature to be brought to bear to
applications involving diffusions. We study our method on synthetic and real
datasets, where we demonstrate superior performance over competing methods.Comment: 37 pages, 13 figure
Rediscovery of Good-Turing estimators via Bayesian nonparametrics
The problem of estimating discovery probabilities originated in the context
of statistical ecology, and in recent years it has become popular due to its
frequent appearance in challenging applications arising in genetics,
bioinformatics, linguistics, designs of experiments, machine learning, etc. A
full range of statistical approaches, parametric and nonparametric as well as
frequentist and Bayesian, has been proposed for estimating discovery
probabilities. In this paper we investigate the relationships between the
celebrated Good-Turing approach, which is a frequentist nonparametric approach
developed in the 1940s, and a Bayesian nonparametric approach recently
introduced in the literature. Specifically, under the assumption of a two
parameter Poisson-Dirichlet prior, we show that Bayesian nonparametric
estimators of discovery probabilities are asymptotically equivalent, for a
large sample size, to suitably smoothed Good-Turing estimators. As a by-product
of this result, we introduce and investigate a methodology for deriving exact
and asymptotic credible intervals to be associated with the Bayesian
nonparametric estimators of discovery probabilities. The proposed methodology
is illustrated through a comprehensive simulation study and the analysis of
Expressed Sequence Tags data generated by sequencing a benchmark complementary
DNA library
- …