5,116 research outputs found

    Topic Models Conditioned on Arbitrary Features with Dirichlet-multinomial Regression

    Full text link
    Although fully generative models have been successfully used to model the contents of text documents, they are often awkward to apply to combinations of text data and document metadata. In this paper we propose a Dirichlet-multinomial regression (DMR) topic model that includes a log-linear prior on document-topic distributions that is a function of observed features of the document, such as author, publication venue, references, and dates. We show that by selecting appropriate features, DMR topic models can meet or exceed the performance of several previously published topic models designed for specific data.Comment: Appears in Proceedings of the Twenty-Fourth Conference on Uncertainty in Artificial Intelligence (UAI2008

    Bayesian semiparametric inference for multivariate doubly-interval-censored data

    Get PDF
    Based on a data set obtained in a dental longitudinal study, conducted in Flanders (Belgium), the joint time to caries distribution of permanent first molars was modeled as a function of covariates. This involves an analysis of multivariate continuous doubly-interval-censored data since: (i) the emergence time of a tooth and the time it experiences caries were recorded yearly, and (ii) events on teeth of the same child are dependent. To model the joint distribution of the emergence times and the times to caries, we propose a dependent Bayesian semiparametric model. A major feature of the proposed approach is that survival curves can be estimated without imposing assumptions such as proportional hazards, additive hazards, proportional odds or accelerated failure time.Comment: Published in at http://dx.doi.org/10.1214/10-AOAS368 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

    A Tutorial on Bayesian Nonparametric Models

    Full text link
    A key problem in statistical modeling is model selection, how to choose a model at an appropriate level of complexity. This problem appears in many settings, most prominently in choosing the number ofclusters in mixture models or the number of factors in factor analysis. In this tutorial we describe Bayesian nonparametric methods, a class of methods that side-steps this issue by allowing the data to determine the complexity of the model. This tutorial is a high-level introduction to Bayesian nonparametric methods and contains several examples of their application.Comment: 28 pages, 8 figure

    Scalable Bayesian nonparametric regression via a Plackett-Luce model for conditional ranks

    Full text link
    We present a novel Bayesian nonparametric regression model for covariates X and continuous, real response variable Y. The model is parametrized in terms of marginal distributions for Y and X and a regression function which tunes the stochastic ordering of the conditional distributions F(y|x). By adopting an approximate composite likelihood approach, we show that the resulting posterior inference can be decoupled for the separate components of the model. This procedure can scale to very large datasets and allows for the use of standard, existing, software from Bayesian nonparametric density estimation and Plackett-Luce ranking estimation to be applied. As an illustration, we show an application of our approach to a US Census dataset, with over 1,300,000 data points and more than 100 covariates

    Distance Dependent Chinese Restaurant Processes

    Full text link
    We develop the distance dependent Chinese restaurant process (CRP), a flexible class of distributions over partitions that allows for non-exchangeability. This class can be used to model many kinds of dependencies between data in infinite clustering models, including dependencies across time or space. We examine the properties of the distance dependent CRP, discuss its connections to Bayesian nonparametric mixture models, and derive a Gibbs sampler for both observed and mixture settings. We study its performance with three text corpora. We show that relaxing the assumption of exchangeability with distance dependent CRPs can provide a better fit to sequential data. We also show its alternative formulation of the traditional CRP leads to a faster-mixing Gibbs sampling algorithm than the one based on the original formulation

    A Fully Nonparametric Modelling Approach to Binary Regression

    Full text link
    We propose a general nonparametric Bayesian framework for binary regression, which is built from modeling for the joint response-covariate distribution. The observed binary responses are assumed to arise from underlying continuous random variables through discretization, and we model the joint distribution of these latent responses and the covariates using a Dirichlet process mixture of multivariate normals. We show that the kernel of the induced mixture model for the observed data is identifiable upon a restriction on the latent variables. To allow for appropriate dependence structure while facilitating identifiability, we use a square-root-free Cholesky decomposition of the covariance matrix in the normal mixture kernel. In addition to allowing for the necessary restriction, this modeling strategy provides substantial simplifications in implementation of Markov chain Monte Carlo posterior simulation. We present two data examples taken from areas for which the methodology is especially well suited. In particular, the first example involves estimation of relationships between environmental variables, and the second develops inference for natural selection surfaces in evolutionary biology. Finally, we discuss extensions to regression settings with multivariate ordinal responses
    • …
    corecore