Search CORE

197 research outputs found

Model-robust regression and a Bayesian ``sandwich'' estimator

Author: Lumley Thomas
Rice Kenneth M.
Szpiro Adam A.
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 20/05/2010
Field of study

We present a new Bayesian approach to model-robust linear regression that leads to uncertainty estimates with the same robustness properties as the Huber--White sandwich estimator. The sandwich estimator is known to provide asymptotically correct frequentist inference, even when standard modeling assumptions such as linearity and homoscedasticity in the data-generating mechanism are violated. Our derivation provides a compelling Bayesian justification for using this simple and popular tool, and it also clarifies what is being estimated when the data-generating mechanism is not linear. We demonstrate the applicability of our approach using a simulation study and health care cost data from an evaluation of the Washington State Basic Health Plan.Comment: Published in at http://dx.doi.org/10.1214/10-AOAS362 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

Crossref

Collection Of Biostatistics Research Archive

Accounting for Errors from Predicting Exposures in Environmental Epidemiology and Environmental Statistics

Author: Lumley Thomas
Sheppard Lianne
Szpiro Adam A
Publication venue: Collection of Biostatistics Research Archive
Publication date: 19/06/2008
Field of study

PLEASE NOTE THAT AN UPDATED VERSION OF THIS RESEARCH IS AVAILABLE AS WORKING PAPER 350 IN THE UNIVERSITY OF WASHINGTON BIOSTATISTICS WORKING PAPER SERIES (http://www.bepress.com/uwbiostat/paper350). In environmental epidemiology and related problems in environmental statistics, it is typically not practical to directly measure the exposure for each subject. Environmental monitoring is employed with a statistical model to assign exposures to individuals. The result is a form of exposure misspecification that can result in complicated errors in the health effect estimates if the exposure is naively treated as known. The exposure error is neither “classical” nor “Berkson”, so standard regression calibration methods do not apply. We decompose the health effect estimation error into three components. First, the standard errors are too small if the exposure field is correlated, independent of variability in estimating the exposure field parameters. Second, the standard errors are too small because they do not account for variability in estimating the exposure field parameters. Third, there is a bias from using approximate exposure field parameters in place of the unobserved true ones. We outline a three-stage correction procedure to account separately for each of these errors. A key insight is that we can account for the second part of the error (sampling variability in estimating the exposure) by averaging over simulations from the part of the posterior exposure surface that is informative for the outcome. This amounts to averaging over samples of the posterior exposure model parameters, a procedure that we call “parameter simulation”. One implication is that it is preferable to use a parametric correlation model (e.g., kriging) rather than a semi-parametric approximation. While the latter approach has been found to be effective in estimating mean exposure fields, it does not provide the needed decomposition of the posterior into informative and non-informative components. We illustrate the properties of our corrected estimators in a simulation study and present an example from environmental statistics. The focus of this paper is on linear health effect models with uncorrelated outcomes, but extensions to generalized linear models and correlated outcomes are possible

Collection Of Biostatistics Research Archive

Model-Robust Bayesian Regression and the Sandwich Estimator

Author: Lumley Thomas
Rice Kenneth M
Szpiro Adam A
Publication venue: Collection of Biostatistics Research Archive
Publication date: 26/12/2007
Field of study

PLEASE NOTE THAT AN UPDATED VERSION OF THIS RESEARCH IS AVAILABLE AS WORKING PAPER 338 IN THE UNIVERSITY OF WASHINGTON BIOSTATISTICS WORKING PAPER SERIES (http://www.bepress.com/uwbiostat/paper338). In applied regression problems there is often sufficient data for accurate estimation, but standard parametric models do not accurately describe the source of the data, so associated uncertainty estimates are not reliable. We describe a simple Bayesian approach to inference in linear regression that recovers least-squares point estimates while providing correct uncertainty bounds by explicitly recognizing that standard modeling assumptions need not be valid. Our model-robust development parallels frequentist estimating equations and leads to intervals with the same robustness properties as the ’sandwich’ estimator

Collection Of Biostatistics Research Archive

Trading Bias for Precision: Decision Theory for Intervals and Sets

Author: Lumley Thomas
Rice Kenneth M
Szpiro Adam A
Publication venue: Collection of Biostatistics Research Archive
Publication date: 11/08/2008
Field of study

Interval- and set-valued decisions are an essential part of statistical inference. Despite this, the justification behind them is often unclear, leading in practice to a great deal of confusion about exactly what is being presented. In this paper we review and attempt to unify several competing methods of interval-construction, within a formal decision-theoretic framework. The result is a new emphasis on interval-estimation as a distinct goal, and not as an afterthought to point estimation. We also see that representing intervals as trade-offs between measures of precision and bias unifies many existing approaches -- as well as suggesting interpretable criteria to calibrate this trade-off. The novel statistical arguments produced allow many extensions, and we apply these to resolve several outstanding areas of disagreement between Bayesians and frequentists

Collection Of Biostatistics Research Archive

Efficient Measurement Error Correction with Spatially Misaligned Data

Author: Lumley Thomas
Sheppard Lianne
Szpiro Adam A
Publication venue: Collection of Biostatistics Research Archive
Publication date: 17/12/2010
Field of study

Association studies in environmental statistics often involve exposure and outcome data that are misaligned in space. A common strategy is to employ a spatial model such as universal kriging to predict exposures at locations with outcome data and then estimate a regression parameter of interest using the predicted exposures. This results in measurement error because the predicted exposures do not correspond exactly to the true values. We characterize the measurement error by decomposing it into Berkson-like and classical-like components. One correction approach is the parametric bootstrap, which is effective but computationally intensive since it requires solving a nonlinear optimization problem for the exposure model parameters in each bootstrap sample. We propose a less computationally intensive alternative termed the ``parameter bootstrap\u27\u27 that only requires solving one nonlinear optimization problem, and we also compare bootstrap methods to other recently proposed methods. We illustrate our methodology in simulations and with publicly available data from the Environmental Protection Agency

Crossref

PubMed Central

Collection Of Biostatistics Research Archive

Reduced-rank spatio-temporal modeling of air pollution concentrations in the Multi-Ethnic Study of Atherosclerosis and Air Pollution

Author: Kaufman Joel D.
Lindström Johan
Olives Casey
Sampson Paul D.
Sheppard Lianne
Szpiro Adam A.
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 01/01/2014
Field of study

There is growing evidence in the epidemiologic literature of the relationship between air pollution and adverse health outcomes. Prediction of individual air pollution exposure in the Environmental Protection Agency (EPA) funded Multi-Ethnic Study of Atheroscelerosis and Air Pollution (MESA Air) study relies on a flexible spatio-temporal prediction model that integrates land-use regression with kriging to account for spatial dependence in pollutant concentrations. Temporal variability is captured using temporal trends estimated via modified singular value decomposition and temporally varying spatial residuals. This model utilizes monitoring data from existing regulatory networks and supplementary MESA Air monitoring data to predict concentrations for individual cohort members. In general, spatio-temporal models are limited in their efficacy for large data sets due to computational intractability. We develop reduced-rank versions of the MESA Air spatio-temporal model. To do so, we apply low-rank kriging to account for spatial variation in the mean process and discuss the limitations of this approach. As an alternative, we represent spatial variation using thin plate regression splines. We compare the performance of the outlined models using EPA and MESA Air monitoring data for predicting concentrations of oxides of nitrogen (NO

_x

)-a pollutant of primary interest in MESA Air-in the Los Angeles metropolitan area via cross-validated

R^2

. Our findings suggest that use of reduced-rank models can improve computational efficiency in certain cases. Low-rank kriging and thin plate regression splines were competitive across the formulations considered, although TPRS appeared to be more robust in some settings.Comment: Published in at http://dx.doi.org/10.1214/14-AOAS786 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

Crossref

Lund University Publications

PubMed Central

Pragmatic Estimation of a Spatio-Temporal Air Quality Model With Irregular Monitoring Data

Author: Kaufman Joel D
Lindström Johan
Sampson Paul D
Sheppard Lianne
Szpiro Adam A
Publication venue: Collection of Biostatistics Research Archive
Publication date: 01/01/2009
Field of study

Statistical analyses of the health effects of air pollution have increasingly used GIS-based covariates for prediction of ambient air quality in “land-use” regression models. More recently these regression models have accounted for spatial correlation structure in combining monitoring data with land-use covariates. The current paper builds on these concepts to address spatio-temporal prediction of ambient concentrations of particulate matter with aerodynamic diameter less than 2.5 μm (PM2.5) on the basis of a model representing spatially varying seasonal trends and spatial correlation structures. Our hierarchical methodology provides a pragmatic approach that fully exploits regulatory and other supplemental monitoring data which jointly define a complex spatio-temporal monitoring design. We explain the elements of the computational approach, including estimation of smoothed empirical orthogonal functions (SEOFs) as basis functions for temporal trend, spatial (“land use”) regression by Partial Least Squares (PLS), modeling of spatio-temporal correlation structure, and generalized universal kriging prediction of ambient exposure for subjects in the Multi-Ethnic Study of Atherosclerosis and Air Pollution (MESA Air) project. Analyses are demonstrated in detail for the South California study area of the MESA Air project using AQS monitoring data from 2000 to 2006 and supplemental MESA Air monitoring data beginning in 2005. Results of application of the modeling and estimation methodology are presented also for five other MESA Air metropolitan study areas across the country with comments on current and future research developments

Lund University Publications

Collection Of Biostatistics Research Archive

Predicting Intra-Urban Variation in Air Pollution Concentrations with Complex Spatio-Temporal Interactions

Author: Adar Sara D
Kaufman Joel
Lumley Thomas
Sampson Paul D
Sheppard Lianne
Szpiro Adam A
Publication venue: Collection of Biostatistics Research Archive
Publication date: 20/11/2008
Field of study

We describe a methodology for assigning individual estimates of long-term average air pollution concentrations that accounts for a complex spatio-temporal correlation structure and can accommodate unbalanced observations. This methodology has been developed as part of the Multi-Ethnic Study of Atherosclerosis and Air Pollution (MESA Air), a prospective cohort study funded by the U.S. EPA to investigate the relationship between chronic exposure to air pollution and cardiovascular disease. Our hierarchical model decomposes the space-time field into a “mean” that includes dependence on covariates and spatially varying seasonal and long-term trends and a “residual” that accounts for spatially correlated deviations from the mean model. The model accommodates complex spatio-temporal patterns by characterizing the temporal trend at each location as a linear combination of empirically derived temporal basis functions, and embedding the spatial fields of coefficients for the basis functions in separate linear regression models with spatially correlated residuals (universal kriging). This approach allows us to implement a scalable single-stage estimation procedure that easily accommodates a significant number of missing observations at some monitoring locations. We apply the model to predict long-term average concentrations of oxides of nitrogen (NOx) from 2005-2007 in the Los Angeles area, based on data from 18 EPA Air Quality System regulatory monitors. The cross-validated R2 is 0.67. The MESA Air study is also collecting additional concentration data as part of a supplementary monitoring campaign. We describe the sampling plan and demonstrate in a simulation study that the additional data will contribute to improved predictions of long-term average concentrations

Collection Of Biostatistics Research Archive