6,877 research outputs found
Generalized structured additive regression based on Bayesian P-splines
Generalized additive models (GAM) for modelling nonlinear effects of continuous covariates are now well established tools for the applied statistician. In this paper we develop Bayesian GAM's and extensions to generalized structured additive regression based on one or two dimensional P-splines as the main building block. The approach extends previous work by Lang und Brezger (2003) for Gaussian responses. Inference relies on Markov chain Monte Carlo (MCMC) simulation techniques, and is either based on iteratively weighted least squares (IWLS) proposals or on latent utility representations of (multi)categorical regression models. Our approach covers the most common univariate response distributions, e.g. the Binomial, Poisson or Gamma distribution, as well as multicategorical responses. For the first time, we present Bayesian semiparametric inference for the widely used multinomial logit models. As we will demonstrate through two applications on the forest health status of trees and a space-time analysis of health insurance data, the approach allows realistic modelling of complex problems. We consider the enormous flexibility and extendability of our approach as a main advantage of Bayesian inference based on MCMC techniques compared to more traditional approaches. Software for the methodology presented in the paper is provided within the public domain package BayesX
Modeling large scale species abundance with latent spatial processes
Modeling species abundance patterns using local environmental features is an
important, current problem in ecology. The Cape Floristic Region (CFR) in South
Africa is a global hot spot of diversity and endemism, and provides a rich
class of species abundance data for such modeling. Here, we propose a
multi-stage Bayesian hierarchical model for explaining species abundance over
this region. Our model is specified at areal level, where the CFR is divided
into roughly one minute grid cells; species abundance is observed at
some locations within some cells. The abundance values are ordinally
categorized. Environmental and soil-type factors, likely to influence the
abundance pattern, are included in the model. We formulate the empirical
abundance pattern as a degraded version of the potential pattern, with the
degradation effect accomplished in two stages. First, we adjust for land use
transformation and then we adjust for measurement error, hence
misclassification error, to yield the observed abundance classifications. An
important point in this analysis is that only of the grid cells have been
sampled and that, for sampled grid cells, the number of sampled locations
ranges from one to more than one hundred. Still, we are able to develop
potential and transformed abundance surfaces over the entire region. In the
hierarchical framework, categorical abundance classifications are induced by
continuous latent surfaces. The degradation model above is built on the latent
scale. On this scale, an areal level spatial regression model was used for
modeling the dependence of species abundance on the environmental factors.Comment: Published in at http://dx.doi.org/10.1214/10-AOAS335 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Locally Adaptive Dynamic Networks
Our focus is on realistically modeling and forecasting dynamic networks of
face-to-face contacts among individuals. Important aspects of such data that
lead to problems with current methods include the tendency of the contacts to
move between periods of slow and rapid changes, and the dynamic heterogeneity
in the actors' connectivity behaviors. Motivated by this application, we
develop a novel method for Locally Adaptive DYnamic (LADY) network inference.
The proposed model relies on a dynamic latent space representation in which
each actor's position evolves in time via stochastic differential equations.
Using a state space representation for these stochastic processes and
P\'olya-gamma data augmentation, we develop an efficient MCMC algorithm for
posterior inference along with tractable procedures for online updating and
forecasting of future networks. We evaluate performance in simulation studies,
and consider an application to face-to-face contacts among individuals in a
primary school
High-dimensional Structured Additive Regression Models: Bayesian Regularisation, Smoothing and Predictive Performance
Data structures in modern applications frequently combine the necessity of flexible regression techniques such as nonlinear and spatial effects with high-dimensional covariate vectors. While estimation of the former is typically achieved by supplementing the likelihood with a suitable smoothness penalty, the latter are usually assigned shrinkage penalties that enforce sparse models.
In this paper, we consider a Bayesian unifying perspective, where conditionally Gaussian priors can be assigned to all types of regression effects. Suitable hyperprior assumptions on the variances of the Gaussian distributions then induce the desired smoothness or sparseness properties. As a major advantage, general Markov chain Monte Carlo simulation algorithms can be developed that allow for the joint estimation of smooth and spatial effects
and regularised coefficient vectors. Two applications demonstrate the usefulness of the proposed procedure: A geoadditive regression model for data from the Munich rental guide and an additive probit model for the prediction of consumer credit defaults. In both cases, high-dimensional vectors of categorical covariates will be included in the regression models. The predictive ability of the resulting high-dimensional structure additive regression models compared to expert models will be of particular relevance and will be evaluated on cross-validation test data
Spike-and-Slab Priors for Function Selection in Structured Additive Regression Models
Structured additive regression provides a general framework for complex
Gaussian and non-Gaussian regression models, with predictors comprising
arbitrary combinations of nonlinear functions and surfaces, spatial effects,
varying coefficients, random effects and further regression terms. The large
flexibility of structured additive regression makes function selection a
challenging and important task, aiming at (1) selecting the relevant
covariates, (2) choosing an appropriate and parsimonious representation of the
impact of covariates on the predictor and (3) determining the required
interactions. We propose a spike-and-slab prior structure for function
selection that allows to include or exclude single coefficients as well as
blocks of coefficients representing specific model terms. A novel
multiplicative parameter expansion is required to obtain good mixing and
convergence properties in a Markov chain Monte Carlo simulation approach and is
shown to induce desirable shrinkage properties. In simulation studies and with
(real) benchmark classification data, we investigate sensitivity to
hyperparameter settings and compare performance to competitors. The flexibility
and applicability of our approach are demonstrated in an additive piecewise
exponential model with time-varying effects for right-censored survival times
of intensive care patients with sepsis. Geoadditive and additive mixed logit
model applications are discussed in an extensive appendix
Calibration of Computational Models with Categorical Parameters and Correlated Outputs via Bayesian Smoothing Spline ANOVA
It has become commonplace to use complex computer models to predict outcomes
in regions where data does not exist. Typically these models need to be
calibrated and validated using some experimental data, which often consists of
multiple correlated outcomes. In addition, some of the model parameters may be
categorical in nature, such as a pointer variable to alternate models (or
submodels) for some of the physics of the system. Here we present a general
approach for calibration in such situations where an emulator of the
computationally demanding models and a discrepancy term from the model to
reality are represented within a Bayesian Smoothing Spline (BSS) ANOVA
framework. The BSS-ANOVA framework has several advantages over the traditional
Gaussian Process, including ease of handling categorical inputs and correlated
outputs, and improved computational efficiency. Finally this framework is then
applied to the problem that motivated its design; a calibration of a
computational fluid dynamics model of a bubbling fluidized which is used as an
absorber in a CO2 capture system
Geoadditive Regression Modeling of Stream Biological Condition
Indices of biotic integrity (IBI) have become an established tool to quantify the condition of small non-tidal streams and their watersheds. To investigate the effects of watershed characteristics on stream biological condition, we present a new technique for regressing IBIs on watershed-specific explanatory variables. Since IBIs are typically evaluated on anordinal scale, our method is based on the proportional odds model for ordinal outcomes. To avoid overfitting, we do not use classical maximum likelihood estimation but a component-wise functional gradient boosting approach. Because component-wise gradient boosting has an intrinsic mechanism for variable selection and model choice, determinants of biotic integrity can be identified. In addition, the method offers a relatively simple way to account for spatial correlation in ecological data. An analysis of the Maryland Biological Streams Survey shows that nonlinear effects of predictor variables on stream condition can be quantified while, in addition, accurate predictions of biological condition at unsurveyed locations are obtained
- …