6,024 research outputs found
Bayesian nonparametric dependent model for partially replicated data: the influence of fuel spills on species diversity
We introduce a dependent Bayesian nonparametric model for the probabilistic
modeling of membership of subgroups in a community based on partially
replicated data. The focus here is on species-by-site data, i.e. community data
where observations at different sites are classified in distinct species. Our
aim is to study the impact of additional covariates, for instance environmental
variables, on the data structure, and in particular on the community diversity.
To that purpose, we introduce dependence a priori across the covariates, and
show that it improves posterior inference. We use a dependent version of the
Griffiths-Engen-McCloskey distribution defined via the stick-breaking
construction. This distribution is obtained by transforming a Gaussian process
whose covariance function controls the desired dependence. The resulting
posterior distribution is sampled by Markov chain Monte Carlo. We illustrate
the application of our model to a soil microbial dataset acquired across a
hydrocarbon contamination gradient at the site of a fuel spill in Antarctica.
This method allows for inference on a number of quantities of interest in
ecotoxicology, such as diversity or effective concentrations, and is broadly
applicable to the general problem of communities response to environmental
variables.Comment: Main Paper: 22 pages, 6 figures. Supplementary Material: 11 pages, 1
figur
Partially-observed models for classifying minerals on Mars
The identification of phyllosilicates by NASA's CRISM (Compact Reconnaissance Imaging Spectrometer for Mars) strongly suggests the presence of water-related geological processes. A variety of water-bearing phyllosilicate minerals have already been identified by several research groups utilizing spectral enrichment techniques and matching phyllosilicate-rich regions on the Martian surface to known spectra of minerals found on earth. However, fully automated analysis of the CRISM data remains a challenge for two main reasons. First, there is significant variability in the spectral signature of the same mineral obtained from different regions on the Martian surface. Second, the list of mineral confirmed to date constituting the set of training classes is not exhaustive. Thus, when classifying new regions, using a classifier trained with selected minerals and chemicals, one must consider the potential presence of unknown materials not represented in the training library. We made an initial attempt to study these problems in the context of our recent work on partially-observed classification models and present results that show the utility of such models in identifying spectra of unknown minerals while simultaneously recognizing spectra of known minerals
Bayesian Nonparametric Multilevel Clustering with Group-Level Contexts
We present a Bayesian nonparametric framework for multilevel clustering which
utilizes group-level context information to simultaneously discover
low-dimensional structures of the group contents and partitions groups into
clusters. Using the Dirichlet process as the building block, our model
constructs a product base-measure with a nested structure to accommodate
content and context observations at multiple levels. The proposed model
possesses properties that link the nested Dirichlet processes (nDP) and the
Dirichlet process mixture models (DPM) in an interesting way: integrating out
all contents results in the DPM over contexts, whereas integrating out
group-specific contexts results in the nDP mixture over content variables. We
provide a Polya-urn view of the model and an efficient collapsed Gibbs
inference procedure. Extensive experiments on real-world datasets demonstrate
the advantage of utilizing context information via our model in both text and
image domains.Comment: Full version of ICML 201
Mixed membership stochastic blockmodels
Observations consisting of measurements on relationships for pairs of objects
arise in many settings, such as protein interaction and gene regulatory
networks, collections of author-recipient email, and social networks. Analyzing
such data with probabilisic models can be delicate because the simple
exchangeability assumptions underlying many boilerplate models no longer hold.
In this paper, we describe a latent variable model of such data called the
mixed membership stochastic blockmodel. This model extends blockmodels for
relational data to ones which capture mixed membership latent relational
structure, thus providing an object-specific low-dimensional representation. We
develop a general variational inference algorithm for fast approximate
posterior inference. We explore applications to social and protein interaction
networks.Comment: 46 pages, 14 figures, 3 table
Centered Partition Process: Informative Priors for Clustering
There is a very rich literature proposing Bayesian approaches for clustering
starting with a prior probability distribution on partitions. Most approaches
assume exchangeability, leading to simple representations in terms of
Exchangeable Partition Probability Functions (EPPF). Gibbs-type priors
encompass a broad class of such cases, including Dirichlet and Pitman-Yor
processes. Even though there have been some proposals to relax the
exchangeability assumption, allowing covariate-dependence and partial
exchangeability, limited consideration has been given on how to include
concrete prior knowledge on the partition. For example, we are motivated by an
epidemiological application, in which we wish to cluster birth defects into
groups and we have prior knowledge of an initial clustering provided by
experts. As a general approach for including such prior knowledge, we propose a
Centered Partition (CP) process that modifies the EPPF to favor partitions
close to an initial one. Some properties of the CP prior are described, a
general algorithm for posterior computation is developed, and we illustrate the
methodology through simulation examples and an application to the motivating
epidemiology study of birth defects
The supervised hierarchical Dirichlet process
We propose the supervised hierarchical Dirichlet process (sHDP), a
nonparametric generative model for the joint distribution of a group of
observations and a response variable directly associated with that whole group.
We compare the sHDP with another leading method for regression on grouped data,
the supervised latent Dirichlet allocation (sLDA) model. We evaluate our method
on two real-world classification problems and two real-world regression
problems. Bayesian nonparametric regression models based on the Dirichlet
process, such as the Dirichlet process-generalised linear models (DP-GLM) have
previously been explored; these models allow flexibility in modelling nonlinear
relationships. However, until now, Hierarchical Dirichlet Process (HDP)
mixtures have not seen significant use in supervised problems with grouped data
since a straightforward application of the HDP on the grouped data results in
learnt clusters that are not predictive of the responses. The sHDP solves this
problem by allowing for clusters to be learnt jointly from the group structure
and from the label assigned to each group.Comment: 14 page
- …