6,877 research outputs found

    Generalized structured additive regression based on Bayesian P-splines

    Get PDF
    Generalized additive models (GAM) for modelling nonlinear effects of continuous covariates are now well established tools for the applied statistician. In this paper we develop Bayesian GAM's and extensions to generalized structured additive regression based on one or two dimensional P-splines as the main building block. The approach extends previous work by Lang und Brezger (2003) for Gaussian responses. Inference relies on Markov chain Monte Carlo (MCMC) simulation techniques, and is either based on iteratively weighted least squares (IWLS) proposals or on latent utility representations of (multi)categorical regression models. Our approach covers the most common univariate response distributions, e.g. the Binomial, Poisson or Gamma distribution, as well as multicategorical responses. For the first time, we present Bayesian semiparametric inference for the widely used multinomial logit models. As we will demonstrate through two applications on the forest health status of trees and a space-time analysis of health insurance data, the approach allows realistic modelling of complex problems. We consider the enormous flexibility and extendability of our approach as a main advantage of Bayesian inference based on MCMC techniques compared to more traditional approaches. Software for the methodology presented in the paper is provided within the public domain package BayesX

    Modeling large scale species abundance with latent spatial processes

    Full text link
    Modeling species abundance patterns using local environmental features is an important, current problem in ecology. The Cape Floristic Region (CFR) in South Africa is a global hot spot of diversity and endemism, and provides a rich class of species abundance data for such modeling. Here, we propose a multi-stage Bayesian hierarchical model for explaining species abundance over this region. Our model is specified at areal level, where the CFR is divided into roughly 37,00037{,}000 one minute grid cells; species abundance is observed at some locations within some cells. The abundance values are ordinally categorized. Environmental and soil-type factors, likely to influence the abundance pattern, are included in the model. We formulate the empirical abundance pattern as a degraded version of the potential pattern, with the degradation effect accomplished in two stages. First, we adjust for land use transformation and then we adjust for measurement error, hence misclassification error, to yield the observed abundance classifications. An important point in this analysis is that only 2828% of the grid cells have been sampled and that, for sampled grid cells, the number of sampled locations ranges from one to more than one hundred. Still, we are able to develop potential and transformed abundance surfaces over the entire region. In the hierarchical framework, categorical abundance classifications are induced by continuous latent surfaces. The degradation model above is built on the latent scale. On this scale, an areal level spatial regression model was used for modeling the dependence of species abundance on the environmental factors.Comment: Published in at http://dx.doi.org/10.1214/10-AOAS335 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Locally Adaptive Dynamic Networks

    Full text link
    Our focus is on realistically modeling and forecasting dynamic networks of face-to-face contacts among individuals. Important aspects of such data that lead to problems with current methods include the tendency of the contacts to move between periods of slow and rapid changes, and the dynamic heterogeneity in the actors' connectivity behaviors. Motivated by this application, we develop a novel method for Locally Adaptive DYnamic (LADY) network inference. The proposed model relies on a dynamic latent space representation in which each actor's position evolves in time via stochastic differential equations. Using a state space representation for these stochastic processes and P\'olya-gamma data augmentation, we develop an efficient MCMC algorithm for posterior inference along with tractable procedures for online updating and forecasting of future networks. We evaluate performance in simulation studies, and consider an application to face-to-face contacts among individuals in a primary school

    High-dimensional Structured Additive Regression Models: Bayesian Regularisation, Smoothing and Predictive Performance

    Get PDF
    Data structures in modern applications frequently combine the necessity of flexible regression techniques such as nonlinear and spatial effects with high-dimensional covariate vectors. While estimation of the former is typically achieved by supplementing the likelihood with a suitable smoothness penalty, the latter are usually assigned shrinkage penalties that enforce sparse models. In this paper, we consider a Bayesian unifying perspective, where conditionally Gaussian priors can be assigned to all types of regression effects. Suitable hyperprior assumptions on the variances of the Gaussian distributions then induce the desired smoothness or sparseness properties. As a major advantage, general Markov chain Monte Carlo simulation algorithms can be developed that allow for the joint estimation of smooth and spatial effects and regularised coefficient vectors. Two applications demonstrate the usefulness of the proposed procedure: A geoadditive regression model for data from the Munich rental guide and an additive probit model for the prediction of consumer credit defaults. In both cases, high-dimensional vectors of categorical covariates will be included in the regression models. The predictive ability of the resulting high-dimensional structure additive regression models compared to expert models will be of particular relevance and will be evaluated on cross-validation test data

    Spike-and-Slab Priors for Function Selection in Structured Additive Regression Models

    Full text link
    Structured additive regression provides a general framework for complex Gaussian and non-Gaussian regression models, with predictors comprising arbitrary combinations of nonlinear functions and surfaces, spatial effects, varying coefficients, random effects and further regression terms. The large flexibility of structured additive regression makes function selection a challenging and important task, aiming at (1) selecting the relevant covariates, (2) choosing an appropriate and parsimonious representation of the impact of covariates on the predictor and (3) determining the required interactions. We propose a spike-and-slab prior structure for function selection that allows to include or exclude single coefficients as well as blocks of coefficients representing specific model terms. A novel multiplicative parameter expansion is required to obtain good mixing and convergence properties in a Markov chain Monte Carlo simulation approach and is shown to induce desirable shrinkage properties. In simulation studies and with (real) benchmark classification data, we investigate sensitivity to hyperparameter settings and compare performance to competitors. The flexibility and applicability of our approach are demonstrated in an additive piecewise exponential model with time-varying effects for right-censored survival times of intensive care patients with sepsis. Geoadditive and additive mixed logit model applications are discussed in an extensive appendix

    Calibration of Computational Models with Categorical Parameters and Correlated Outputs via Bayesian Smoothing Spline ANOVA

    Full text link
    It has become commonplace to use complex computer models to predict outcomes in regions where data does not exist. Typically these models need to be calibrated and validated using some experimental data, which often consists of multiple correlated outcomes. In addition, some of the model parameters may be categorical in nature, such as a pointer variable to alternate models (or submodels) for some of the physics of the system. Here we present a general approach for calibration in such situations where an emulator of the computationally demanding models and a discrepancy term from the model to reality are represented within a Bayesian Smoothing Spline (BSS) ANOVA framework. The BSS-ANOVA framework has several advantages over the traditional Gaussian Process, including ease of handling categorical inputs and correlated outputs, and improved computational efficiency. Finally this framework is then applied to the problem that motivated its design; a calibration of a computational fluid dynamics model of a bubbling fluidized which is used as an absorber in a CO2 capture system

    Geoadditive Regression Modeling of Stream Biological Condition

    Get PDF
    Indices of biotic integrity (IBI) have become an established tool to quantify the condition of small non-tidal streams and their watersheds. To investigate the effects of watershed characteristics on stream biological condition, we present a new technique for regressing IBIs on watershed-specific explanatory variables. Since IBIs are typically evaluated on anordinal scale, our method is based on the proportional odds model for ordinal outcomes. To avoid overfitting, we do not use classical maximum likelihood estimation but a component-wise functional gradient boosting approach. Because component-wise gradient boosting has an intrinsic mechanism for variable selection and model choice, determinants of biotic integrity can be identified. In addition, the method offers a relatively simple way to account for spatial correlation in ecological data. An analysis of the Maryland Biological Streams Survey shows that nonlinear effects of predictor variables on stream condition can be quantified while, in addition, accurate predictions of biological condition at unsurveyed locations are obtained
    corecore