8,816 research outputs found
A Tutorial on Bayesian Nonparametric Models
A key problem in statistical modeling is model selection, how to choose a
model at an appropriate level of complexity. This problem appears in many
settings, most prominently in choosing the number ofclusters in mixture models
or the number of factors in factor analysis. In this tutorial we describe
Bayesian nonparametric methods, a class of methods that side-steps this issue
by allowing the data to determine the complexity of the model. This tutorial is
a high-level introduction to Bayesian nonparametric methods and contains
several examples of their application.Comment: 28 pages, 8 figure
Spatial Joint Species Distribution Modeling using Dirichlet Processes
Species distribution models usually attempt to explain presence-absence or
abundance of a species at a site in terms of the environmental features
(socalled abiotic features) present at the site. Historically, such models have
considered species individually. However, it is well-established that species
interact to influence presence-absence and abundance (envisioned as biotic
factors). As a result, there has been substantial recent interest in joint
species distribution models with various types of response, e.g.,
presence-absence, continuous and ordinal data. Such models incorporate
dependence between species response as a surrogate for interaction.
The challenge we focus on here is how to address such modeling in the context
of a large number of species (e.g., order 102) across sites numbering in the
order of 102 or 103 when, in practice, only a few species are found at any
observed site. Again, there is some recent literature to address this; we adopt
a dimension reduction approach. The novel wrinkle we add here is spatial
dependence. That is, we have a collection of sites over a relatively small
spatial region so it is anticipated that species distribution at a given site
would be similar to that at a nearby site. Specifically, we handle dimension
reduction through Dirichlet processes joined with spatial dependence through
Gaussian processes.
We use both simulated data and a plant communities dataset for the Cape
Floristic Region (CFR) of South Africa to demonstrate our approach. The latter
consists of presence-absence measurements for 639 tree species on 662
locations. Through both data examples we are able to demonstrate improved
predictive performance using the foregoing specification
Temporal Topic Analysis with Endogenous and Exogenous Processes
We consider the problem of modeling temporal textual data taking endogenous
and exogenous processes into account. Such text documents arise in real world
applications, including job advertisements and economic news articles, which
are influenced by the fluctuations of the general economy. We propose a
hierarchical Bayesian topic model which imposes a "group-correlated"
hierarchical structure on the evolution of topics over time incorporating both
processes, and show that this model can be estimated from Markov chain Monte
Carlo sampling methods. We further demonstrate that this model captures the
intrinsic relationships between the topic distribution and the time-dependent
factors, and compare its performance with latent Dirichlet allocation (LDA) and
two other related models. The model is applied to two collections of documents
to illustrate its empirical performance: online job advertisements from
DirectEmployers Association and journalists' postings on BusinessInsider.com
- …