726 research outputs found
Nested Hierarchical Dirichlet Processes
We develop a nested hierarchical Dirichlet process (nHDP) for hierarchical
topic modeling. The nHDP is a generalization of the nested Chinese restaurant
process (nCRP) that allows each word to follow its own path to a topic node
according to a document-specific distribution on a shared tree. This alleviates
the rigid, single-path formulation of the nCRP, allowing a document to more
easily express thematic borrowings as a random effect. We derive a stochastic
variational inference algorithm for the model, in addition to a greedy subtree
selection method for each document, which allows for efficient inference using
massive collections of text documents. We demonstrate our algorithm on 1.8
million documents from The New York Times and 3.3 million documents from
Wikipedia.Comment: To appear in IEEE Transactions on Pattern Analysis and Machine
Intelligence, Special Issue on Bayesian Nonparametric
msBP: An R package to perform Bayesian nonparametric inference using multiscale Bernstein polynomials mixtures
msBP is an R package that implements a new method to perform Bayesian multiscale nonparametric inference introduced by Canale and Dunson (2016). The method, based on mixtures of multiscale beta dictionary densities, overcomes the drawbacks of PĂłlya trees and inherits many of the advantages of Dirichlet process mixture models. The key idea is that an infinitely-deep binary tree is introduced, with a beta dictionary density assigned to each node of the tree. Using a multiscale stick-breaking characterization, stochastically decreasing weights are assigned to each node. The result is an infinite mixture model. The package msBP implements a series of basic functions to deal with this family of priors such as random densities and numbers generation, creation and manipulation of binary tree objects, and generic functions to plot and print the results. In addition, it implements the Gibbs samplers for posterior computation to perform multiscale density estimation and multiscale testing of group differences described in Canale and Dunson (2016)
Scalable Bayesian nonparametric regression via a Plackett-Luce model for conditional ranks
We present a novel Bayesian nonparametric regression model for covariates X
and continuous, real response variable Y. The model is parametrized in terms of
marginal distributions for Y and X and a regression function which tunes the
stochastic ordering of the conditional distributions F(y|x). By adopting an
approximate composite likelihood approach, we show that the resulting posterior
inference can be decoupled for the separate components of the model. This
procedure can scale to very large datasets and allows for the use of standard,
existing, software from Bayesian nonparametric density estimation and
Plackett-Luce ranking estimation to be applied. As an illustration, we show an
application of our approach to a US Census dataset, with over 1,300,000 data
points and more than 100 covariates
Particle Learning for General Mixtures
This paper develops particle learning (PL) methods for the estimation of general mixture models. The approach is distinguished from alternative particle filtering methods in two major ways. First, each iteration begins by resampling particles according to posterior predictive probability, leading to a more efficient set for propagation. Second, each particle tracks only the "essential state vector" thus leading to reduced dimensional inference. In addition, we describe how the approach will apply to more general mixture models of current interest in the literature; it is hoped that this will inspire a greater number of researchers to adopt sequential Monte Carlo methods for fitting their sophisticated mixture based models. Finally, we show that PL leads to straight forward tools for marginal likelihood calculation and posterior cluster allocation.Business Administratio
A Tutorial on Bayesian Nonparametric Models
A key problem in statistical modeling is model selection, how to choose a
model at an appropriate level of complexity. This problem appears in many
settings, most prominently in choosing the number ofclusters in mixture models
or the number of factors in factor analysis. In this tutorial we describe
Bayesian nonparametric methods, a class of methods that side-steps this issue
by allowing the data to determine the complexity of the model. This tutorial is
a high-level introduction to Bayesian nonparametric methods and contains
several examples of their application.Comment: 28 pages, 8 figure
- …