184 research outputs found

    Particle Gibbs Split-Merge Sampling for Bayesian Inference in Mixture Models

    Full text link
    This paper presents a new Markov chain Monte Carlo method to sample from the posterior distribution of conjugate mixture models. This algorithm relies on a flexible split-merge procedure built using the particle Gibbs sampler. Contrary to available split-merge procedures, the resulting so-called Particle Gibbs Split-Merge sampler does not require the computation of a complex acceptance ratio, is simple to implement using existing sequential Monte Carlo libraries and can be parallelized. We investigate its performance experimentally on synthetic problems as well as on geolocation and cancer genomics data. In all these examples, the particle Gibbs split-merge sampler outperforms state-of-the-art split-merge methods by up to an order of magnitude for a fixed computational complexity

    Producing power-law distributions and damping word frequencies with two-stage language models

    Get PDF
    Standard statistical models of language fail to capture one of the most striking properties of natural languages: the power-law distribution in the frequencies of word tokens. We present a framework for developing statisticalmodels that can generically produce power laws, breaking generativemodels into two stages. The first stage, the generator, can be any standard probabilistic model, while the second stage, the adaptor, transforms the word frequencies of this model to provide a closer match to natural language. We show that two commonly used Bayesian models, the Dirichlet-multinomial model and the Dirichlet process, can be viewed as special cases of our framework. We discuss two stochastic processes-the Chinese restaurant process and its two-parameter generalization based on the Pitman-Yor process-that can be used as adaptors in our framework to produce power-law distributions over word frequencies. We show that these adaptors justify common estimation procedures based on logarithmic or inverse-power transformations of empirical frequencies. In addition, taking the Pitman-Yor Chinese restaurant process as an adaptor justifies the appearance of type frequencies in formal analyses of natural language and improves the performance of a model for unsupervised learning of morphology.48 page(s

    Gibbs sampling methods for Pitman-Yor mixture models

    Get PDF
    We introduce a new sampling strategy for the two-parameter Poisson-Dirichlet process mixture model, also known as Pitman-Yor process mixture model (PYM). Our sampler is therefore applicable to the well-known Dirichlet process mixture model (DPM). Inference in DPM and PYM is usually performed via Markov Chain Monte Carlo (MCMC) methods, specifi cally the Gibbs sampler. These sampling methods are usually divided in two classes: marginal and conditional algorithms. Each method has its merits and limitations. The aim of this paper is to propose a new sampler that combines the main advantages of each class. The key idea of the proposed sampler consists in replacing the standard posterior updating of the mixing measure based on the stick-breaking representation, with a posterior updating of Pitman(1996) which represents the posterior law under a Pitman-Yor process as the sum of a jump part and a continuous one. We sample the continuous part in two ways, leading to two variants of the proposed sampler. We also propose a threshold to improve mixing in the first variant of our algorithm. The two variants of our sampler are compared with a marginal method, that is the celebrated Algorithm 8 of Neal(2000), and two conditional algorithms based on the stick-breaking representation, namely the efficient slice sampler of Kalli et al. (2011) and the truncated blocked Gibbs sampler of Ishwaran and James (2001). We also investigate e ffects of removing the proposed threshold in the first variant of our algorithm and introducing the threshold in the efficient slice sampler of Kalli et al. (2011). Results on real and simulated data sets illustrate that our algorithms outperform the other conditionals in terms of mixing properties

    A survey on Bayesian nonparametric learning

    Full text link
    © 2019 Copyright held by the owner/author(s). Publication rights licensed to ACM. Bayesian (machine) learning has been playing a significant role in machine learning for a long time due to its particular ability to embrace uncertainty, encode prior knowledge, and endow interpretability. On the back of Bayesian learning's great success, Bayesian nonparametric learning (BNL) has emerged as a force for further advances in this field due to its greater modelling flexibility and representation power. Instead of playing with the fixed-dimensional probabilistic distributions of Bayesian learning, BNL creates a new “game” with infinite-dimensional stochastic processes. BNL has long been recognised as a research subject in statistics, and, to date, several state-of-the-art pilot studies have demonstrated that BNL has a great deal of potential to solve real-world machine-learning tasks. However, despite these promising results, BNL has not created a huge wave in the machine-learning community. Esotericism may account for this. The books and surveys on BNL written by statisticians are overcomplicated and filled with tedious theories and proofs. Each is certainly meaningful but may scare away new researchers, especially those with computer science backgrounds. Hence, the aim of this article is to provide a plain-spoken, yet comprehensive, theoretical survey of BNL in terms that researchers in the machine-learning community can understand. It is hoped this survey will serve as a starting point for understanding and exploiting the benefits of BNL in our current scholarly endeavours. To achieve this goal, we have collated the extant studies in this field and aligned them with the steps of a standard BNL procedure-from selecting the appropriate stochastic processes through manipulation to executing the model inference algorithms. At each step, past efforts have been thoroughly summarised and discussed. In addition, we have reviewed the common methods for implementing BNL in various machine-learning tasks along with its diverse applications in the real world as examples to motivate future studies

    Modeling Insurance Claims using Bayesian Nonparametric Regression

    Full text link
    The prediction of future insurance claims based on observed risk factors, or covariates, help the actuary set insurance premiums. Typically, actuaries use parametric regression models to predict claims based on the covariate information. Such models assume the same functional form tying the response to the covariates for each data point. These models are not flexible enough and can fail to accurately capture at the individual level, the relationship between the covariates and the claims frequency and severity, which are often multimodal, highly skewed, and heavy-tailed. In this article, we explore the use of Bayesian nonparametric (BNP) regression models to predict claims frequency and severity based on covariates. In particular, we model claims frequency as a mixture of Poisson regression, and the logarithm of claims severity as a mixture of normal regression. We use the Dirichlet process (DP) and Pitman-Yor process (PY) as a prior for the mixing distribution over the regression parameters. Unlike parametric regression, such models allow each data point to have its individual parameters, making them highly flexible, resulting in improved prediction accuracy. We describe model fitting using MCMC and illustrate their applicability using French motor insurance claims data

    Advances in Bayesian asymptotics and Bayesian nonparametrics

    Get PDF
    Bayesian statistics is a powerful approach to learning real-world phenomena, its strength lying in its ability to quantify uncertainty explicitly by treating unknown quantities of interest as random variables. In this thesis, we consider questions regarding three quite different aspects of Bayesian learning. Firstly, we consider approximate Bayesian computation (ABC), a computational method suitable for computing approximate posterior distributions for highly complex models, where the likelihood function is intractable but can be simulated from. Previous authors have proved consistency and provided rates of convergence in the case where all summary statistics converge at the same rate as each other. We generalize to the case where summary statistics may converge at different rates, and provide an explicit representation of the shape of the ABC posterior distribution in our general setting. We also show under our general setting that local linear post-processing can lead to significantly faster contraction rates of the pseudo-posterior. We then focus on the application of Bayesian statistics to natural language processing. The class of context-free grammars, which are standard in the modelling of natural language, have been shown to be too restrictive to fully describe all features of natural language. We propose a Bayesian non-parametric model for the class of 2-multiple context-free grammars, which generalise context-free grammars. Our model is inspired by previously proposed Bayesian models for context-tree grammars and is based on the hierarchical Dirichlet process. We develop a sequential Monte Carlo algorithm to make inference under this model and carry out simulation studies to assess our method. Finally, we consider some consistency issues related to Bayesian nonparametric mixture models. It has been shown that these models are inconsistent for the number of clusters. In the case of Dirichlet process (DP) mixture models, this problem can be mitigated when a prior is put on the model's concentration hyperparameter α, as is common practice. We prove that Pitman--Yor process (PYP) mixture models (which generalise DP mixture models) remain inconsistent for the number of clusters when a prior is put on α, in the special case where the true number of components in the data generating mechanism is equal to 1 and the discount parameter σ is a fixed constant. When considering the space over partitions induced by BNP mixture models, point estimators such as the maximum a posteriori (MAP) are commonly used to summarise the posterior clustering structure of such models, which alone can be complex and difficult to interpret. We prove consistency of the MAP partition for DP mixture models when the concentration parameter, α, goes deterministically to zero, and when the true partition is made of only one cluster

    Streaming, Distributed Variational Inference for Bayesian Nonparametrics

    Full text link
    This paper presents a methodology for creating streaming, distributed inference algorithms for Bayesian nonparametric (BNP) models. In the proposed framework, processing nodes receive a sequence of data minibatches, compute a variational posterior for each, and make asynchronous streaming updates to a central model. In contrast to previous algorithms, the proposed framework is truly streaming, distributed, asynchronous, learning-rate-free, and truncation-free. The key challenge in developing the framework, arising from the fact that BNP models do not impose an inherent ordering on their components, is finding the correspondence between minibatch and central BNP posterior components before performing each update. To address this, the paper develops a combinatorial optimization problem over component correspondences, and provides an efficient solution technique. The paper concludes with an application of the methodology to the DP mixture model, with experimental results demonstrating its practical scalability and performance.Comment: This paper was presented at NIPS 2015. Please use the following BibTeX citation: @inproceedings{Campbell15_NIPS, Author = {Trevor Campbell and Julian Straub and John W. {Fisher III} and Jonathan P. How}, Title = {Streaming, Distributed Variational Inference for Bayesian Nonparametrics}, Booktitle = {Advances in Neural Information Processing Systems (NIPS)}, Year = {2015}
    corecore