103 research outputs found
Model-robust regression and a Bayesian ``sandwich'' estimator
We present a new Bayesian approach to model-robust linear regression that
leads to uncertainty estimates with the same robustness properties as the
Huber--White sandwich estimator. The sandwich estimator is known to provide
asymptotically correct frequentist inference, even when standard modeling
assumptions such as linearity and homoscedasticity in the data-generating
mechanism are violated. Our derivation provides a compelling Bayesian
justification for using this simple and popular tool, and it also clarifies
what is being estimated when the data-generating mechanism is not linear. We
demonstrate the applicability of our approach using a simulation study and
health care cost data from an evaluation of the Washington State Basic Health
Plan.Comment: Published in at http://dx.doi.org/10.1214/10-AOAS362 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Accounting for Errors from Predicting Exposures in Environmental Epidemiology and Environmental Statistics
PLEASE NOTE THAT AN UPDATED VERSION OF THIS RESEARCH IS AVAILABLE AS WORKING PAPER 350 IN THE UNIVERSITY OF WASHINGTON BIOSTATISTICS WORKING PAPER SERIES (http://www.bepress.com/uwbiostat/paper350).
In environmental epidemiology and related problems in environmental statistics, it is typically not practical to directly measure the exposure for each subject. Environmental monitoring is employed with a statistical model to assign exposures to individuals. The result is a form of exposure misspecification that can result in complicated errors in the health effect estimates if the exposure is naively treated as known. The exposure error is neither “classical” nor “Berkson”, so standard regression calibration methods do not apply. We decompose the health effect estimation error into three components. First, the standard errors are too small if the exposure field is correlated, independent of variability in estimating the exposure field parameters. Second, the standard errors are too small because they do not account for variability in estimating the exposure field parameters. Third, there is a bias from using approximate exposure field parameters in place of the unobserved true ones. We outline a three-stage correction procedure to account separately for each of these errors. A key insight is that we can account for the second part of the error (sampling variability in estimating the exposure) by averaging over simulations from the part of the posterior exposure surface that is informative for the outcome. This amounts to averaging over samples of the posterior exposure model parameters, a procedure that we call “parameter simulation”. One implication is that it is preferable to use a parametric correlation model (e.g., kriging) rather than a semi-parametric approximation. While the latter approach has been found to be effective in estimating mean exposure fields, it does not provide the needed decomposition of the posterior into informative and non-informative components. We illustrate the properties of our corrected estimators in a simulation study and present an example from environmental statistics. The focus of this paper is on linear health effect models with uncorrelated outcomes, but extensions to generalized linear models and correlated outcomes are possible
Efficient Measurement Error Correction with Spatially Misaligned Data
Association studies in environmental statistics often involve exposure and outcome data that are misaligned in space. A common strategy is to employ a spatial model such as universal kriging to predict exposures at locations with outcome data and then estimate a regression parameter of interest using the predicted exposures. This results in measurement error because the predicted exposures do not correspond exactly to the true values. We characterize the measurement error by decomposing it into Berkson-like and classical-like components. One correction approach is the parametric bootstrap, which is effective but computationally intensive since it requires solving a nonlinear optimization problem for the exposure model parameters in each bootstrap sample. We propose a less computationally intensive alternative termed the ``parameter bootstrap\u27\u27 that only requires solving one nonlinear optimization problem, and we also compare bootstrap methods to other recently proposed methods. We illustrate our methodology in simulations and with publicly available data from the Environmental Protection Agency
Model-Robust Bayesian Regression and the Sandwich Estimator
PLEASE NOTE THAT AN UPDATED VERSION OF THIS RESEARCH IS AVAILABLE AS WORKING PAPER 338 IN THE UNIVERSITY OF WASHINGTON BIOSTATISTICS WORKING PAPER SERIES (http://www.bepress.com/uwbiostat/paper338).
In applied regression problems there is often sufficient data for accurate estimation, but standard parametric models do not accurately describe the source of the data, so associated uncertainty estimates are not reliable. We describe a simple Bayesian approach to inference in linear regression that recovers least-squares point estimates while providing correct uncertainty bounds by explicitly recognizing that standard modeling assumptions need not be valid. Our model-robust development parallels frequentist estimating equations and leads to intervals with the same robustness properties as the ’sandwich’ estimator
Trading Bias for Precision: Decision Theory for Intervals and Sets
Interval- and set-valued decisions are an essential part of statistical inference. Despite this, the justification behind them is often unclear, leading in practice to a great deal of confusion about exactly what is being presented. In this paper we review and attempt to unify several competing methods of interval-construction, within a formal decision-theoretic framework. The result is a new emphasis on interval-estimation as a distinct goal, and not as an afterthought to point estimation. We also see that representing intervals as trade-offs between measures of precision and bias unifies many existing approaches -- as well as suggesting interpretable criteria to calibrate this trade-off. The novel statistical arguments produced allow many extensions, and we apply these to resolve several outstanding areas of disagreement between Bayesians and frequentists
Reduced-rank spatio-temporal modeling of air pollution concentrations in the Multi-Ethnic Study of Atherosclerosis and Air Pollution
There is growing evidence in the epidemiologic literature of the relationship
between air pollution and adverse health outcomes. Prediction of individual air
pollution exposure in the Environmental Protection Agency (EPA) funded
Multi-Ethnic Study of Atheroscelerosis and Air Pollution (MESA Air) study
relies on a flexible spatio-temporal prediction model that integrates land-use
regression with kriging to account for spatial dependence in pollutant
concentrations. Temporal variability is captured using temporal trends
estimated via modified singular value decomposition and temporally varying
spatial residuals. This model utilizes monitoring data from existing regulatory
networks and supplementary MESA Air monitoring data to predict concentrations
for individual cohort members. In general, spatio-temporal models are limited
in their efficacy for large data sets due to computational intractability. We
develop reduced-rank versions of the MESA Air spatio-temporal model. To do so,
we apply low-rank kriging to account for spatial variation in the mean process
and discuss the limitations of this approach. As an alternative, we represent
spatial variation using thin plate regression splines. We compare the
performance of the outlined models using EPA and MESA Air monitoring data for
predicting concentrations of oxides of nitrogen (NO)-a pollutant of primary
interest in MESA Air-in the Los Angeles metropolitan area via cross-validated
. Our findings suggest that use of reduced-rank models can improve
computational efficiency in certain cases. Low-rank kriging and thin plate
regression splines were competitive across the formulations considered,
although TPRS appeared to be more robust in some settings.Comment: Published in at http://dx.doi.org/10.1214/14-AOAS786 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
- …