175 research outputs found

    The Importance of Scale for Spatial-Confounding Bias and Precision of Spatial Regression Estimators

    Get PDF
    Residuals in regression models are often spatially correlated. Prominent examples include studies in environmental epidemiology to understand the chronic health effects of pollutants. I consider the effects of residual spatial structure on the bias and precision of regression coefficients, developing a simple framework in which to understand the key issues and derive informative analytic results. When unmeasured confounding introduces spatial structure into the residuals, regression models with spatial random effects and closely-related models such as kriging and penalized splines are biased, even when the residual variance components are known. Analytic and simulation results show how the bias depends on the spatial scales of the covariate and the residual: one can reduce bias by fitting a spatial model only when there is variation in the covariate at a scale smaller than the scale of the unmeasured confounding. I also discuss how the scales of the residual and the covariate affect efficiency and uncertainty estimation when the residuals are independent of the covariate. In an application on the association between black carbon particulate matter air pollution and birth weight, controlling for large-scale spatial variation appears to reduce bias from unmeasured confounders, while increasing uncertainty in the estimated pollution effect.Comment: Published in at http://dx.doi.org/10.1214/10-STS326 the Statistical Science (http://www.imstat.org/sts/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Bayesian Smoothing with Gaussian Processes Using Fourier Basis Functions in the spectralGP Package

    Get PDF
    The spectral representation of stationary Gaussian processes via the Fourier basis provides a computationally efficient specification of spatial surfaces and nonparametric regression functions for use in various statistical models. I describe the representation in detail and introduce the spectralGP package in R for computations. Because of the large number of basis coefficients, some form of shrinkage is necessary; I focus on a natural Bayesian approach via a particular parameterized prior structure that approximates stationary Gaussian processes on a regular grid. I review several models from the literature for data that do not lie on a grid, suggest a simple model modification, and provide example code demonstrating MCMC sampling using the spectralGP package. I describe reasons that mixing can be slow in certain situations and provide some suggestions for MCMC techniques to improve mixing, also with example code, and some general recommendations grounded in experience.

    Bayesian Smoothing with Gaussian Processes Using Fourier Basis Functions in the spectralGP Package

    Get PDF
    The spectral representation of stationary Gaussian processes via the Fourier basis provides a computationally efficient specification of spatial surfaces and nonparametric regression functions for use in various statistical models. I describe the representation in detail and introduce the spectralGP package in R for computations. Because of the large number of basis coefficients, some form of shrinkage is necessary; I focus on a natural Bayesian approach via a particular parameterized prior structure that approximates stationary Gaussian processes on a regular grid. I review several models from the literature for data that do not lie on a grid, suggest a simple model modification, and provide example code demonstrating MCMC sampling using the spectralGP package. I describe reasons that mixing can be slow in certain situations and provide some suggestions for MCMC techniques to improve mixing, also with example code, and some general recommendations grounded in experience

    Bayesian Smoothing of Irregularly-spaced Data Using Fourier Basis Functions

    Get PDF

    Quantile-based bias correction and uncertainty quantification of extreme event attribution statements

    Get PDF
    Extreme event attribution characterizes how anthropogenic climate change may have influenced the probability and magnitude of selected individual extreme weather and climate events. Attribution statements often involve quantification of the fraction of attributable risk (FAR) or the risk ratio (RR) and associated confidence intervals. Many such analyses use climate model output to characterize extreme event behavior with and without anthropogenic influence. However, such climate models may have biases in their representation of extreme events. To account for discrepancies in the probabilities of extreme events between observational datasets and model datasets, we demonstrate an appropriate rescaling of the model output based on the quantiles of the datasets to estimate an adjusted risk ratio. Our methodology accounts for various components of uncertainty in estimation of the risk ratio. In particular, we present an approach to construct a one-sided confidence interval on the lower bound of the risk ratio when the estimated risk ratio is infinity. We demonstrate the methodology using the summer 2011 central US heatwave and output from the Community Earth System Model. In this example, we find that the lower bound of the risk ratio is relatively insensitive to the magnitude and probability of the actual event.Comment: 28 pages, 4 figures, 3 table

    Quantifying statistical uncertainty in the attribution of human influence on severe weather

    Get PDF
    Event attribution in the context of climate change seeks to understand the role of anthropogenic greenhouse gas emissions on extreme weather events, either specific events or classes of events. A common approach to event attribution uses climate model output under factual (real-world) and counterfactual (world that might have been without anthropogenic greenhouse gas emissions) scenarios to estimate the probabilities of the event of interest under the two scenarios. Event attribution is then quantified by the ratio of the two probabilities. While this approach has been applied many times in the last 15 years, the statistical techniques used to estimate the risk ratio based on climate model ensembles have not drawn on the full set of methods available in the statistical literature and have in some cases used and interpreted the bootstrap method in non-standard ways. We present a precise frequentist statistical framework for quantifying the effect of sampling uncertainty on estimation of the risk ratio, propose the use of statistical methods that are new to event attribution, and evaluate a variety of methods using statistical simulations. We conclude that existing statistical methods not yet in use for event attribution have several advantages over the widely-used bootstrap, including better statistical performance in repeated samples and robustness to small estimated probabilities. Software for using the methods is available through the climextRemes package available for R or Python. While we focus on frequentist statistical methods, Bayesian methods are likely to be particularly useful when considering sources of uncertainty beyond sampling uncertainty.Comment: 41 pages, 11 figures, 1 tabl

    Computational Techniques for Spatial Logistic Regression with Large Datasets

    Get PDF
    In epidemiological work, outcomes are frequently non-normal, sample sizes may be large, and effects are often small. To relate health outcomes to geographic risk factors, fast and powerful methods for fitting spatial models, particularly for non-normal data, are required. We focus on binary outcomes, with the risk surface a smooth function of space. We compare penalized likelihood models, including the penalized quasi-likelihood (PQL) approach, and Bayesian models based on fit, speed, and ease of implementation. A Bayesian model using a spectral basis representation of the spatial surface provides the best tradeoff of sensitivity and specificity in simulations, detecting real spatial features while limiting overfitting and being more efficient computationally than other Bayesian approaches. One of the contributions of this work is further development of this underused representation. The spectral basis model outperforms the penalized likelihood methods, which are prone to overfitting, but is slower to fit and not as easily implemented. Conclusions based on a real dataset of cancer cases in Taiwan are similar albeit less conclusive with respect to comparing the approaches. The success of the spectral basis with binary data and similar results with count data suggest that it may be generally useful in spatial models and more complicated hierarchical models
    corecore