4,601 research outputs found

    Gamma Processes, Stick-Breaking, and Variational Inference

    Full text link
    While most Bayesian nonparametric models in machine learning have focused on the Dirichlet process, the beta process, or their variants, the gamma process has recently emerged as a useful nonparametric prior in its own right. Current inference schemes for models involving the gamma process are restricted to MCMC-based methods, which limits their scalability. In this paper, we present a variational inference framework for models involving gamma process priors. Our approach is based on a novel stick-breaking constructive definition of the gamma process. We prove correctness of this stick-breaking process by using the characterization of the gamma process as a completely random measure (CRM), and we explicitly derive the rate measure of our construction using Poisson process machinery. We also derive error bounds on the truncation of the infinite process required for variational inference, similar to the truncation analyses for other nonparametric models based on the Dirichlet and beta processes. Our representation is then used to derive a variational inference algorithm for a particular Bayesian nonparametric latent structure formulation known as the infinite Gamma-Poisson model, where the latent variables are drawn from a gamma process prior with Poisson likelihoods. Finally, we present results for our algorithms on nonnegative matrix factorization tasks on document corpora, and show that we compare favorably to both sampling-based techniques and variational approaches based on beta-Bernoulli priors

    A Tutorial on Bayesian Nonparametric Models

    Full text link
    A key problem in statistical modeling is model selection, how to choose a model at an appropriate level of complexity. This problem appears in many settings, most prominently in choosing the number ofclusters in mixture models or the number of factors in factor analysis. In this tutorial we describe Bayesian nonparametric methods, a class of methods that side-steps this issue by allowing the data to determine the complexity of the model. This tutorial is a high-level introduction to Bayesian nonparametric methods and contains several examples of their application.Comment: 28 pages, 8 figure

    Priors for Random Count Matrices Derived from a Family of Negative Binomial Processes

    Full text link
    We define a family of probability distributions for random count matrices with a potentially unbounded number of rows and columns. The three distributions we consider are derived from the gamma-Poisson, gamma-negative binomial, and beta-negative binomial processes. Because the models lead to closed-form Gibbs sampling update equations, they are natural candidates for nonparametric Bayesian priors over count matrices. A key aspect of our analysis is the recognition that, although the random count matrices within the family are defined by a row-wise construction, their columns can be shown to be i.i.d. This fact is used to derive explicit formulas for drawing all the columns at once. Moreover, by analyzing these matrices' combinatorial structure, we describe how to sequentially construct a column-i.i.d. random count matrix one row at a time, and derive the predictive distribution of a new row count vector with previously unseen features. We describe the similarities and differences between the three priors, and argue that the greater flexibility of the gamma- and beta- negative binomial processes, especially their ability to model over-dispersed, heavy-tailed count data, makes these well suited to a wide variety of real-world applications. As an example of our framework, we construct a naive-Bayes text classifier to categorize a count vector to one of several existing random count matrices of different categories. The classifier supports an unbounded number of features, and unlike most existing methods, it does not require a predefined finite vocabulary to be shared by all the categories, and needs neither feature selection nor parameter tuning. Both the gamma- and beta- negative binomial processes are shown to significantly outperform the gamma-Poisson process for document categorization, with comparable performance to other state-of-the-art supervised text classification algorithms.Comment: To appear in Journal of the American Statistical Association (Theory and Methods). 31 pages + 11 page supplement, 5 figure

    Modeling for seasonal marked point processes: An analysis of evolving hurricane occurrences

    Full text link
    Seasonal point processes refer to stochastic models for random events which are only observed in a given season. We develop nonparametric Bayesian methodology to study the dynamic evolution of a seasonal marked point process intensity. We assume the point process is a nonhomogeneous Poisson process and propose a nonparametric mixture of beta densities to model dynamically evolving temporal Poisson process intensities. Dependence structure is built through a dependent Dirichlet process prior for the seasonally-varying mixing distributions. We extend the nonparametric model to incorporate time-varying marks, resulting in flexible inference for both the seasonal point process intensity and for the conditional mark distribution. The motivating application involves the analysis of hurricane landfalls with reported damages along the U.S. Gulf and Atlantic coasts from 1900 to 2010. We focus on studying the evolution of the intensity of the process of hurricane landfall occurrences, and the respective maximum wind speed and associated damages. Our results indicate an increase in the number of hurricane landfall occurrences and a decrease in the median maximum wind speed at the peak of the season. Introducing standardized damage as a mark, such that reported damages are comparable both in time and space, we find that there is no significant rising trend in hurricane damages over time.Comment: Published at http://dx.doi.org/10.1214/14-AOAS796 in the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org
    corecore