266 research outputs found

    Fast MCMC sampling for Markov jump processes and extensions

    Markov jump processes (or continuous-time Markov chains) are a simple and important class of continuous-time dynamical systems. In this paper, we tackle the problem of simulating from the posterior distribution over paths in these models, given partial and noisy observations. Our approach is an auxiliary variable Gibbs sampler, and is based on the idea of uniformization. This sets up a Markov chain over paths by alternately sampling a finite set of virtual jump times given the current path and then sampling a new path given the set of extant and virtual jump times using a standard hidden Markov model forward filtering-backward sampling algorithm. Our method is exact and does not involve approximations like time-discretization. We demonstrate how our sampler extends naturally to MJP-based models like Markov-modulated Poisson processes and continuous-time Bayesian networks and show significant computational benefits over state-of-the-art MCMC samplers for these models.Comment: Accepted at the Journal of Machine Learning Research (JMLR

    Bayesian nonparametric models for ranked data

    We develop a Bayesian nonparametric extension of the popular Plackett-Luce choice model that can handle an infinite number of choice items. Our framework is based on the theory of random atomic measures, with the prior specified by a gamma process. We derive a posterior characterization and a simple and effective Gibbs sampler for posterior simulation. We develop a time-varying extension of our model, and apply it to the New York Times lists of weekly bestselling books.Comment: NIPS - Neural Information Processing Systems (2012

    A nonparametric HMM for genetic imputation and coalescent inference

    Genetic sequence data are well described by hidden Markov models (HMMs) in which latent states correspond to clusters of similar mutation patterns. Theory from statistical genetics suggests that these HMMs are nonhomogeneous (their transition probabilities vary along the chromosome) and have large support for self transitions. We develop a new nonparametric model of genetic sequence data, based on the hierarchical Dirichlet process, which supports these self transitions and nonhomogeneity. Our model provides a parameterization of the genetic process that is more parsimonious than other more general nonparametric models which have previously been applied to population genetics. We provide truncation-free MCMC inference for our model using a new auxiliary sampling scheme for Bayesian nonparametric HMMs. In a series of experiments on male X chromosome data from the Thousand Genomes Project and also on data simulated from a population bottleneck we show the benefits of our model over the popular finite model fastPHASE, which can itself be seen as a parametric truncation of our model. We find that the number of HMM states found by our model is correlated with the time to the most recent common ancestor in population bottlenecks. This work demonstrates the flexibility of Bayesian nonparametrics applied to large and complex genetic data

    A hybrid sampler for Poisson-Kingman mixture models

    This paper concerns the introduction of a new Markov Chain Monte Carlo scheme for posterior sampling in Bayesian nonparametric mixture models with priors that belong to the general Poisson-Kingman class. We present a novel compact way of representing the infinite dimensional component of the model such that while explicitly representing this infinite component it has less memory and storage requirements than previous MCMC schemes. We describe comparative simulation results demonstrating the efficacy of the proposed MCMC algorithm against existing marginal and conditional MCMC samplers

    An Exact Auxiliary Variable Gibbs Sampler for a Class of Diffusions

    Stochastic differential equations (SDEs) or diffusions are continuous-valued continuous-time stochastic processes widely used in the applied and mathematical sciences. Simulating paths from these processes is usually an intractable problem, and typically involves time-discretization approximations. We propose an exact Markov chain Monte Carlo sampling algorithm that involves no such time-discretization error. Our sampler is applicable to the problem of prior simulation from an SDE, posterior simulation conditioned on noisy observations, as well as parameter inference given noisy observations. Our work recasts an existing rejection sampling algorithm for a class of diffusions as a latent variable model, and then derives an auxiliary variable Gibbs sampling algorithm that targets the associated joint distribution. At a high level, the resulting algorithm involves two steps: simulating a random grid of times from an inhomogeneous Poisson process, and updating the SDE trajectory conditioned on this grid. Our work allows the vast literature of Monte Carlo sampling algorithms from the Gaussian process literature to be brought to bear to applications involving diffusions. We study our method on synthetic and real datasets, where we demonstrate superior performance over competing methods.Comment: 37 pages, 13 figure

    Rediscovery of Good-Turing estimators via Bayesian nonparametrics

    The problem of estimating discovery probabilities originated in the context of statistical ecology, and in recent years it has become popular due to its frequent appearance in challenging applications arising in genetics, bioinformatics, linguistics, designs of experiments, machine learning, etc. A full range of statistical approaches, parametric and nonparametric as well as frequentist and Bayesian, has been proposed for estimating discovery probabilities. In this paper we investigate the relationships between the celebrated Good-Turing approach, which is a frequentist nonparametric approach developed in the 1940s, and a Bayesian nonparametric approach recently introduced in the literature. Specifically, under the assumption of a two parameter Poisson-Dirichlet prior, we show that Bayesian nonparametric estimators of discovery probabilities are asymptotically equivalent, for a large sample size, to suitably smoothed Good-Turing estimators. As a by-product of this result, we introduce and investigate a methodology for deriving exact and asymptotic credible intervals to be associated with the Bayesian nonparametric estimators of discovery probabilities. The proposed methodology is illustrated through a comprehensive simulation study and the analysis of Expressed Sequence Tags data generated by sequencing a benchmark complementary DNA library
