6,421 research outputs found
Priors for Random Count Matrices Derived from a Family of Negative Binomial Processes
We define a family of probability distributions for random count matrices
with a potentially unbounded number of rows and columns. The three
distributions we consider are derived from the gamma-Poisson, gamma-negative
binomial, and beta-negative binomial processes. Because the models lead to
closed-form Gibbs sampling update equations, they are natural candidates for
nonparametric Bayesian priors over count matrices. A key aspect of our analysis
is the recognition that, although the random count matrices within the family
are defined by a row-wise construction, their columns can be shown to be i.i.d.
This fact is used to derive explicit formulas for drawing all the columns at
once. Moreover, by analyzing these matrices' combinatorial structure, we
describe how to sequentially construct a column-i.i.d. random count matrix one
row at a time, and derive the predictive distribution of a new row count vector
with previously unseen features. We describe the similarities and differences
between the three priors, and argue that the greater flexibility of the gamma-
and beta- negative binomial processes, especially their ability to model
over-dispersed, heavy-tailed count data, makes these well suited to a wide
variety of real-world applications. As an example of our framework, we
construct a naive-Bayes text classifier to categorize a count vector to one of
several existing random count matrices of different categories. The classifier
supports an unbounded number of features, and unlike most existing methods, it
does not require a predefined finite vocabulary to be shared by all the
categories, and needs neither feature selection nor parameter tuning. Both the
gamma- and beta- negative binomial processes are shown to significantly
outperform the gamma-Poisson process for document categorization, with
comparable performance to other state-of-the-art supervised text classification
algorithms.Comment: To appear in Journal of the American Statistical Association (Theory
and Methods). 31 pages + 11 page supplement, 5 figure
A unifying representation for a class of dependent random measures
We present a general construction for dependent random measures based on
thinning Poisson processes on an augmented space. The framework is not
restricted to dependent versions of a specific nonparametric model, but can be
applied to all models that can be represented using completely random measures.
Several existing dependent random measures can be seen as specific cases of
this framework. Interesting properties of the resulting measures are derived
and the efficacy of the framework is demonstrated by constructing a
covariate-dependent latent feature model and topic model that obtain superior
predictive performance
BNP-Seq: Bayesian Nonparametric Differential Expression Analysis of Sequencing Count Data
We perform differential expression analysis of high-throughput sequencing
count data under a Bayesian nonparametric framework, removing sophisticated
ad-hoc pre-processing steps commonly required in existing algorithms. We
propose to use the gamma (beta) negative binomial process, which takes into
account different sequencing depths using sample-specific negative binomial
probability (dispersion) parameters, to detect differentially expressed genes
by comparing the posterior distributions of gene-specific negative binomial
dispersion (probability) parameters. These model parameters are inferred by
borrowing statistical strength across both the genes and samples. Extensive
experiments on both simulated and real-world RNA sequencing count data show
that the proposed differential expression analysis algorithms clearly
outperform previously proposed ones in terms of the areas under both the
receiver operating characteristic and precision-recall curves.Comment: To appear in Journal of the American Statistical Associatio
A geoadditive Bayesian latent variable model for Poisson indicators
We introduce a new latent variable model with count variable indicators, where usual linear parametric effects of covariates, nonparametric effects of continuous covariates and spatial effects on the continuous latent variables are modelled through a geoadditive predictor. Bayesian modelling of nonparametric functions and spatial effects is based on penalized spline and Markov random field priors. Full Bayesian inference is performed via an auxiliary variable Gibbs sampling technique, using a recent suggestion of Frühwirth-Schnatter and Wagner (2006). As an advantage, our Poisson indicator latent variable model can be combined with semiparametric latent variable models for mixed binary, ordinal and continuous indicator variables within an unified and coherent framework for modelling and inference. A simulation study investigates performance, and an application to post war human security in Cambodia illustrates the approach
- …