5,116 research outputs found
Topic Models Conditioned on Arbitrary Features with Dirichlet-multinomial Regression
Although fully generative models have been successfully used to model the
contents of text documents, they are often awkward to apply to combinations of
text data and document metadata. In this paper we propose a
Dirichlet-multinomial regression (DMR) topic model that includes a log-linear
prior on document-topic distributions that is a function of observed features
of the document, such as author, publication venue, references, and dates. We
show that by selecting appropriate features, DMR topic models can meet or
exceed the performance of several previously published topic models designed
for specific data.Comment: Appears in Proceedings of the Twenty-Fourth Conference on Uncertainty
in Artificial Intelligence (UAI2008
Bayesian semiparametric inference for multivariate doubly-interval-censored data
Based on a data set obtained in a dental longitudinal study, conducted in
Flanders (Belgium), the joint time to caries distribution of permanent first
molars was modeled as a function of covariates. This involves an analysis of
multivariate continuous doubly-interval-censored data since: (i) the emergence
time of a tooth and the time it experiences caries were recorded yearly, and
(ii) events on teeth of the same child are dependent. To model the joint
distribution of the emergence times and the times to caries, we propose a
dependent Bayesian semiparametric model. A major feature of the proposed
approach is that survival curves can be estimated without imposing assumptions
such as proportional hazards, additive hazards, proportional odds or
accelerated failure time.Comment: Published in at http://dx.doi.org/10.1214/10-AOAS368 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
A Tutorial on Bayesian Nonparametric Models
A key problem in statistical modeling is model selection, how to choose a
model at an appropriate level of complexity. This problem appears in many
settings, most prominently in choosing the number ofclusters in mixture models
or the number of factors in factor analysis. In this tutorial we describe
Bayesian nonparametric methods, a class of methods that side-steps this issue
by allowing the data to determine the complexity of the model. This tutorial is
a high-level introduction to Bayesian nonparametric methods and contains
several examples of their application.Comment: 28 pages, 8 figure
Scalable Bayesian nonparametric regression via a Plackett-Luce model for conditional ranks
We present a novel Bayesian nonparametric regression model for covariates X
and continuous, real response variable Y. The model is parametrized in terms of
marginal distributions for Y and X and a regression function which tunes the
stochastic ordering of the conditional distributions F(y|x). By adopting an
approximate composite likelihood approach, we show that the resulting posterior
inference can be decoupled for the separate components of the model. This
procedure can scale to very large datasets and allows for the use of standard,
existing, software from Bayesian nonparametric density estimation and
Plackett-Luce ranking estimation to be applied. As an illustration, we show an
application of our approach to a US Census dataset, with over 1,300,000 data
points and more than 100 covariates
Distance Dependent Chinese Restaurant Processes
We develop the distance dependent Chinese restaurant process (CRP), a
flexible class of distributions over partitions that allows for
non-exchangeability. This class can be used to model many kinds of dependencies
between data in infinite clustering models, including dependencies across time
or space. We examine the properties of the distance dependent CRP, discuss its
connections to Bayesian nonparametric mixture models, and derive a Gibbs
sampler for both observed and mixture settings. We study its performance with
three text corpora. We show that relaxing the assumption of exchangeability
with distance dependent CRPs can provide a better fit to sequential data. We
also show its alternative formulation of the traditional CRP leads to a
faster-mixing Gibbs sampling algorithm than the one based on the original
formulation
A Fully Nonparametric Modelling Approach to Binary Regression
We propose a general nonparametric Bayesian framework for binary regression,
which is built from modeling for the joint response-covariate distribution. The
observed binary responses are assumed to arise from underlying continuous
random variables through discretization, and we model the joint distribution of
these latent responses and the covariates using a Dirichlet process mixture of
multivariate normals. We show that the kernel of the induced mixture model for
the observed data is identifiable upon a restriction on the latent variables.
To allow for appropriate dependence structure while facilitating
identifiability, we use a square-root-free Cholesky decomposition of the
covariance matrix in the normal mixture kernel. In addition to allowing for the
necessary restriction, this modeling strategy provides substantial
simplifications in implementation of Markov chain Monte Carlo posterior
simulation. We present two data examples taken from areas for which the
methodology is especially well suited. In particular, the first example
involves estimation of relationships between environmental variables, and the
second develops inference for natural selection surfaces in evolutionary
biology. Finally, we discuss extensions to regression settings with
multivariate ordinal responses
- …