175 research outputs found
The Importance of Scale for Spatial-Confounding Bias and Precision of Spatial Regression Estimators
Residuals in regression models are often spatially correlated. Prominent
examples include studies in environmental epidemiology to understand the
chronic health effects of pollutants. I consider the effects of residual
spatial structure on the bias and precision of regression coefficients,
developing a simple framework in which to understand the key issues and derive
informative analytic results. When unmeasured confounding introduces spatial
structure into the residuals, regression models with spatial random effects and
closely-related models such as kriging and penalized splines are biased, even
when the residual variance components are known. Analytic and simulation
results show how the bias depends on the spatial scales of the covariate and
the residual: one can reduce bias by fitting a spatial model only when there is
variation in the covariate at a scale smaller than the scale of the unmeasured
confounding. I also discuss how the scales of the residual and the covariate
affect efficiency and uncertainty estimation when the residuals are independent
of the covariate. In an application on the association between black carbon
particulate matter air pollution and birth weight, controlling for large-scale
spatial variation appears to reduce bias from unmeasured confounders, while
increasing uncertainty in the estimated pollution effect.Comment: Published in at http://dx.doi.org/10.1214/10-STS326 the Statistical
Science (http://www.imstat.org/sts/) by the Institute of Mathematical
Statistics (http://www.imstat.org
Bayesian Smoothing with Gaussian Processes Using Fourier Basis Functions in the spectralGP Package
The spectral representation of stationary Gaussian processes via the Fourier basis provides a computationally efficient specification of spatial surfaces and nonparametric regression functions for use in various statistical models. I describe the representation in detail and introduce the spectralGP package in R for computations. Because of the large number of basis coefficients, some form of shrinkage is necessary; I focus on a natural Bayesian approach via a particular parameterized prior structure that approximates stationary Gaussian processes on a regular grid. I review several models from the literature for data that do not lie on a grid, suggest a simple model modification, and provide example code demonstrating MCMC sampling using the spectralGP package. I describe reasons that mixing can be slow in certain situations and provide some suggestions for MCMC techniques to improve mixing, also with example code, and some general recommendations grounded in experience.
Bayesian Smoothing with Gaussian Processes Using Fourier Basis Functions in the spectralGP Package
The spectral representation of stationary Gaussian processes via the Fourier basis provides a computationally efficient specification of spatial surfaces and nonparametric regression functions for use in various statistical models. I describe the representation in detail and introduce the spectralGP package in R for computations. Because of the large number of basis coefficients, some form of shrinkage is necessary; I focus on a natural Bayesian approach via a particular parameterized prior structure that approximates stationary Gaussian processes on a regular grid. I review several models from the literature for data that do not lie on a grid, suggest a simple model modification, and provide example code demonstrating MCMC sampling using the spectralGP package. I describe reasons that mixing can be slow in certain situations and provide some suggestions for MCMC techniques to improve mixing, also with example code, and some general recommendations grounded in experience
Quantile-based bias correction and uncertainty quantification of extreme event attribution statements
Extreme event attribution characterizes how anthropogenic climate change may
have influenced the probability and magnitude of selected individual extreme
weather and climate events. Attribution statements often involve quantification
of the fraction of attributable risk (FAR) or the risk ratio (RR) and
associated confidence intervals. Many such analyses use climate model output to
characterize extreme event behavior with and without anthropogenic influence.
However, such climate models may have biases in their representation of extreme
events. To account for discrepancies in the probabilities of extreme events
between observational datasets and model datasets, we demonstrate an
appropriate rescaling of the model output based on the quantiles of the
datasets to estimate an adjusted risk ratio. Our methodology accounts for
various components of uncertainty in estimation of the risk ratio. In
particular, we present an approach to construct a one-sided confidence interval
on the lower bound of the risk ratio when the estimated risk ratio is infinity.
We demonstrate the methodology using the summer 2011 central US heatwave and
output from the Community Earth System Model. In this example, we find that the
lower bound of the risk ratio is relatively insensitive to the magnitude and
probability of the actual event.Comment: 28 pages, 4 figures, 3 table
Quantifying statistical uncertainty in the attribution of human influence on severe weather
Event attribution in the context of climate change seeks to understand the
role of anthropogenic greenhouse gas emissions on extreme weather events,
either specific events or classes of events. A common approach to event
attribution uses climate model output under factual (real-world) and
counterfactual (world that might have been without anthropogenic greenhouse gas
emissions) scenarios to estimate the probabilities of the event of interest
under the two scenarios. Event attribution is then quantified by the ratio of
the two probabilities. While this approach has been applied many times in the
last 15 years, the statistical techniques used to estimate the risk ratio based
on climate model ensembles have not drawn on the full set of methods available
in the statistical literature and have in some cases used and interpreted the
bootstrap method in non-standard ways. We present a precise frequentist
statistical framework for quantifying the effect of sampling uncertainty on
estimation of the risk ratio, propose the use of statistical methods that are
new to event attribution, and evaluate a variety of methods using statistical
simulations. We conclude that existing statistical methods not yet in use for
event attribution have several advantages over the widely-used bootstrap,
including better statistical performance in repeated samples and robustness to
small estimated probabilities. Software for using the methods is available
through the climextRemes package available for R or Python. While we focus on
frequentist statistical methods, Bayesian methods are likely to be particularly
useful when considering sources of uncertainty beyond sampling uncertainty.Comment: 41 pages, 11 figures, 1 tabl
Computational Techniques for Spatial Logistic Regression with Large Datasets
In epidemiological work, outcomes are frequently non-normal, sample sizes may be large, and effects are often small. To relate health outcomes to geographic risk factors, fast and powerful methods for fitting spatial models, particularly for non-normal data, are required. We focus on binary outcomes, with the risk surface a smooth function of space. We compare penalized likelihood models, including the penalized quasi-likelihood (PQL) approach, and Bayesian models based on fit, speed, and ease of implementation.
A Bayesian model using a spectral basis representation of the spatial surface provides the best tradeoff of sensitivity and specificity in simulations, detecting real spatial features while limiting overfitting and being more efficient computationally than other Bayesian approaches. One of the contributions of this work is further development of this underused representation. The spectral basis model outperforms the penalized likelihood methods, which are prone to overfitting, but is slower to fit and not as easily implemented. Conclusions based on a real dataset of cancer cases in Taiwan are similar albeit less conclusive with respect to comparing the approaches.
The success of the spectral basis with binary data and similar results with count data suggest that it may be generally useful in spatial models and more complicated hierarchical models
- …