7,435 research outputs found
The supervised hierarchical Dirichlet process
We propose the supervised hierarchical Dirichlet process (sHDP), a
nonparametric generative model for the joint distribution of a group of
observations and a response variable directly associated with that whole group.
We compare the sHDP with another leading method for regression on grouped data,
the supervised latent Dirichlet allocation (sLDA) model. We evaluate our method
on two real-world classification problems and two real-world regression
problems. Bayesian nonparametric regression models based on the Dirichlet
process, such as the Dirichlet process-generalised linear models (DP-GLM) have
previously been explored; these models allow flexibility in modelling nonlinear
relationships. However, until now, Hierarchical Dirichlet Process (HDP)
mixtures have not seen significant use in supervised problems with grouped data
since a straightforward application of the HDP on the grouped data results in
learnt clusters that are not predictive of the responses. The sHDP solves this
problem by allowing for clusters to be learnt jointly from the group structure
and from the label assigned to each group.Comment: 14 page
Generative Supervised Classification Using Dirichlet Process Priors.
Choosing the appropriate parameter prior distributions associated to a given Bayesian model is a challenging problem. Conjugate priors can be selected for simplicity motivations. However, conjugate priors can be too restrictive to accurately model the available prior information. This paper studies a new generative supervised classifier which assumes that the parameter prior distributions conditioned on each class are mixtures of Dirichlet processes. The motivations for using mixtures of Dirichlet processes is their known ability to model accurately a large class of probability distributions. A Monte Carlo method allowing one to sample according to the resulting class-conditional posterior distributions is then studied. The parameters appearing in the class-conditional densities can then be estimated using these generated samples (following Bayesian learning). The proposed supervised classifier is applied to the classification of altimetric waveforms backscattered from different surfaces (oceans, ices, forests, and deserts). This classification is a first step before developing tools allowing for the extraction of useful geophysical information from altimetric waveforms backscattered from nonoceanic surfaces
A Tutorial on Bayesian Nonparametric Models
A key problem in statistical modeling is model selection, how to choose a
model at an appropriate level of complexity. This problem appears in many
settings, most prominently in choosing the number ofclusters in mixture models
or the number of factors in factor analysis. In this tutorial we describe
Bayesian nonparametric methods, a class of methods that side-steps this issue
by allowing the data to determine the complexity of the model. This tutorial is
a high-level introduction to Bayesian nonparametric methods and contains
several examples of their application.Comment: 28 pages, 8 figure
- …