226 research outputs found
Reduced-rank spatio-temporal modeling of air pollution concentrations in the Multi-Ethnic Study of Atherosclerosis and Air Pollution
There is growing evidence in the epidemiologic literature of the relationship
between air pollution and adverse health outcomes. Prediction of individual air
pollution exposure in the Environmental Protection Agency (EPA) funded
Multi-Ethnic Study of Atheroscelerosis and Air Pollution (MESA Air) study
relies on a flexible spatio-temporal prediction model that integrates land-use
regression with kriging to account for spatial dependence in pollutant
concentrations. Temporal variability is captured using temporal trends
estimated via modified singular value decomposition and temporally varying
spatial residuals. This model utilizes monitoring data from existing regulatory
networks and supplementary MESA Air monitoring data to predict concentrations
for individual cohort members. In general, spatio-temporal models are limited
in their efficacy for large data sets due to computational intractability. We
develop reduced-rank versions of the MESA Air spatio-temporal model. To do so,
we apply low-rank kriging to account for spatial variation in the mean process
and discuss the limitations of this approach. As an alternative, we represent
spatial variation using thin plate regression splines. We compare the
performance of the outlined models using EPA and MESA Air monitoring data for
predicting concentrations of oxides of nitrogen (NO)-a pollutant of primary
interest in MESA Air-in the Los Angeles metropolitan area via cross-validated
. Our findings suggest that use of reduced-rank models can improve
computational efficiency in certain cases. Low-rank kriging and thin plate
regression splines were competitive across the formulations considered,
although TPRS appeared to be more robust in some settings.Comment: Published in at http://dx.doi.org/10.1214/14-AOAS786 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Measurement error in a multi-level analysis of air pollution and health: a simulation study.
BACKGROUND: Spatio-temporal models are increasingly being used to predict exposure to ambient outdoor air pollution at high spatial resolution for inclusion in epidemiological analyses of air pollution and health. Measurement error in these predictions can nevertheless have impacts on health effect estimation. Using statistical simulation we aim to investigate the effects of such error within a multi-level model analysis of long and short-term pollutant exposure and health. METHODS: Our study was based on a theoretical sample of 1000 geographical sites within Greater London. Simulations of "true" site-specific daily mean and 5-year mean NO2 and PM10 concentrations, incorporating both temporal variation and spatial covariance, were informed by an analysis of daily measurements over the period 2009-2013 from fixed location urban background monitors in the London area. In the context of a multi-level single-pollutant Poisson regression analysis of mortality, we investigated scenarios in which we specified: the Pearson correlation between modelled and "true" data and the ratio of their variances (model versus "true") and assumed these parameters were the same spatially and temporally. RESULTS: In general, health effect estimates associated with both long and short-term exposure were biased towards the null with the level of bias increasing to over 60% as the correlation coefficient decreased from 0.9 to 0.5 and the variance ratio increased from 0.5 to 2. However, for a combination of high correlation (0.9) and small variance ratio (0.5) non-trivial bias (> 25%) away from the null was observed. Standard errors of health effect estimates, though unaffected by changes in the correlation coefficient, appeared to be attenuated for variance ratios > 1 but inflated for variance ratios < 1. CONCLUSION: While our findings suggest that in most cases modelling errors result in attenuation of the effect estimate towards the null, in some situations a non-trivial bias away from the null may occur. The magnitude and direction of bias appears to depend on the relationship between modelled and "true" data in terms of their correlation and the ratio of their variances. These factors should be taken into account when assessing the validity of modelled air pollution predictions for use in complex epidemiological models
Pragmatic Estimation of a Spatio-Temporal Air Quality Model With Irregular Monitoring Data
Statistical analyses of the health effects of air pollution have increasingly used GIS-based covariates for prediction of ambient air quality in “land-use” regression models. More recently these regression models have accounted for spatial correlation structure in combining monitoring data with land-use covariates. The current paper builds on these concepts to address spatio-temporal prediction of ambient concentrations of particulate matter with aerodynamic diameter less than 2.5 μm (PM2.5) on the basis of a model representing spatially varying seasonal trends and spatial correlation structures. Our hierarchical methodology provides a pragmatic approach that fully exploits regulatory and other supplemental monitoring data which jointly define a complex spatio-temporal monitoring design. We explain the elements of the computational approach, including estimation of smoothed empirical orthogonal functions (SEOFs) as basis functions for temporal trend, spatial (“land use”) regression by Partial Least Squares (PLS), modeling of spatio-temporal correlation structure, and generalized universal kriging prediction of ambient exposure for subjects in the Multi-Ethnic Study of Atherosclerosis and Air Pollution (MESA Air) project. Analyses are demonstrated in detail for the South California study area of the MESA Air project using AQS monitoring data from 2000 to 2006 and supplemental MESA Air monitoring data beginning in 2005. Results of application of the modeling and estimation methodology are presented also for five other MESA Air metropolitan study areas across the country with comments on current and future research developments
Predicting Intra-Urban Variation in Air Pollution Concentrations with Complex Spatio-Temporal Interactions
We describe a methodology for assigning individual estimates of long-term average air pollution concentrations that accounts for a complex spatio-temporal correlation structure and can accommodate unbalanced observations. This methodology has been developed as part of the Multi-Ethnic Study of Atherosclerosis and Air Pollution (MESA Air), a prospective cohort study funded by the U.S. EPA to investigate the relationship between chronic exposure to air pollution and cardiovascular disease. Our hierarchical model decomposes the space-time field into a “mean” that includes dependence on covariates and spatially varying seasonal and long-term trends and a “residual” that accounts for spatially correlated deviations from the mean model. The model accommodates complex spatio-temporal patterns by characterizing the temporal trend at each location as a linear combination of empirically derived temporal basis functions, and embedding the spatial fields of coefficients for the basis functions in separate linear regression models with spatially correlated residuals (universal kriging). This approach allows us to implement a scalable single-stage estimation procedure that easily accommodates a significant number of missing observations at some monitoring locations. We apply the model to predict long-term average concentrations of oxides of nitrogen (NOx) from 2005-2007 in the Los Angeles area, based on data from 18 EPA Air Quality System regulatory monitors. The cross-validated R2 is 0.67. The MESA Air study is also collecting additional concentration data as part of a supplementary monitoring campaign. We describe the sampling plan and demonstrate in a simulation study that the additional data will contribute to improved predictions of long-term average concentrations
A Flexible Spatio-Temporal Model for Air Pollution: Allowing for Spatio-Temporal Covariates
Given the increasing interest in the association between exposure to air pollution and adverse health outcomes, the development of models that provide accurate spatio-temporal predictions of air pollution concentrations at small spatial scales is of great importance when assessing potential health effects of air pollution. The methodology presented here has been developed as part of the Multi-Ethnic Study of Atherosclerosis and Air Pollution (MESA Air), a prospective cohort study funded by the US EPA to investigate the relationship between chronic exposure to air pollution and cardiovascular disease. We present a spatio-temporal framework that models and predicts ambient air pollution by combining data from several different monitoring networks with the output from deterministic air pollution model(s). The model can accommodate arbitrarily missing observations and allows for a complex spatio-temporal correlation structure.
We apply the model to predict long-term average concentrations of gaseous oxides of nitrogen (NOx) ─ one of the primary pollutants of interest in the MESA Air study ─ during a ten year period in the Los Angeles area, based on measurements from the EPA Air Quality System and MESA Air monitoring. The measurements are augmented by a spatio-temporal covariate based on the output from a source dispersion model for traffic related air pollution (Caline3QHC) and the model is evaluated using cross-validation. The predictive ability of the model is good with cross-validated R2 of approximately 0.7 at subject sites.
The incorporation of a dispersion model output into the overall prediction model was feasible, but the particular implementation of Caline3QHC used here did not improve predictions in a model that also includes road information. However, excluding the road information the inclusion of model output improves predictions and we find some evidence that the source dispersion model can replace road covariates.
The model presented in this paper has been implemented in an R package, SpatioTemporal, which will be available on CRAN shortly
Concentrations of criteria pollutants in the contiguous U.S., 1979 – 2015: Role of model parsimony in integrated empirical geographic regression
BACKGROUND: National- or regional-scale prediction models that estimate individual-level air pollution concentrations commonly include hundreds of geographic variables. However, these many variables may not be necessary and parsimonious approach including small numbers of variables may achieve sufficient prediction ability. This parsimonious approach can also be applied to most criteria pollutants. This approach will be powerful when generating publicly available datasets of model predictions that support research in environmental health and other fields. OBJECTIVES: We aim to (1) build annual-average integrated empirical geographic (IEG) regression models for the contiguous U.S. for six criteria pollutants, for all years with regulatory monitoring data during 1979 – 2015; (2) explore the impact of model parsimony on model performance by comparing the model performance depending on the numbers or variables offered into a model; and (3) provide publicly available model predictions. METHODS: We compute annual-average concentrations from regulatory monitoring data for PM10, PM2.5, NO2, SO2, CO, and ozone at all monitoring sites for 1979-2015. We also compute ~900 geographic characteristics at each location including measures of traffic, land use, and satellite-based estimates of air pollution and landcover. We then develop IEG models, employing universal kriging and summary factors estimated by partial least squares (PLS) of independent variables. For all pollutants and years, we compare three approaches for choosing variables to include in the model: (1) no variables (kriging only), (2) a limited number of variables chosen by forward selection, and (3) all variables. We evaluate model performance using 10-fold cross-validation (CV) using conventional randomly-selected and spatially-clustered test data. RESULTS: Models using 3 to 30 variables generally have the best performance across all pollutants and years (median R2 conventional [clustered] CV: 0.66 [0.47]) compared to models with no (0.37 [0]) or all variables (0.64 [0.27]). Using the best models mostly including 3-30 variables, we predicted annual-average concentrations of six criteria pollutants for all Census Blocks in the contiguous U.S.
DISCUSSION: Our findings suggest that national prediction models can be built on only a small number (30 or fewer) of important variables and provide robust concentration estimates. Model estimates are freely available online
Recommended from our members
Exposure measurement error in PM2.5 health effects studies: A pooled analysis of eight personal exposure validation studies
Background: Exposure measurement error is a concern in long-term PM2.5 health studies using ambient concentrations as exposures. We assessed error magnitude by estimating calibration coefficients as the association between personal PM2.5 exposures from validation studies and typically available surrogate exposures. Methods: Daily personal and ambient PM2.5, and when available sulfate, measurements were compiled from nine cities, over 2 to 12 days. True exposure was defined as personal exposure to PM2.5 of ambient origin. Since PM2.5 of ambient origin could only be determined for five cities, personal exposure to total PM2.5 was also considered. Surrogate exposures were estimated as ambient PM2.5 at the nearest monitor or predicted outside subjects’ homes. We estimated calibration coefficients by regressing true on surrogate exposures in random effects models. Results: When monthly-averaged personal PM2.5 of ambient origin was used as the true exposure, calibration coefficients equaled 0.31 (95% CI:0.14, 0.47) for nearest monitor and 0.54 (95% CI:0.42, 0.65) for outdoor home predictions. Between-city heterogeneity was not found for outdoor home PM2.5 for either true exposure. Heterogeneity was significant for nearest monitor PM2.5, for both true exposures, but not after adjusting for city-average motor vehicle number for total personal PM2.5. Conclusions: Calibration coefficients were <1, consistent with previously reported chronic health risks using nearest monitor exposures being under-estimated when ambient concentrations are the exposure of interest. Calibration coefficients were closer to 1 for outdoor home predictions, likely reflecting less spatial error. Further research is needed to determine how our findings can be incorporated in future health studies
Forecasting confined spatiotemporal chaos with genetic algorithms
A technique to forecast spatiotemporal time series is presented. it uses a
Proper Ortogonal or Karhunen-Lo\`{e}ve Decomposition to encode large
spatiotemporal data sets in a few time-series, and Genetic Algorithms to
efficiently extract dynamical rules from the data. The method works very well
for confined systems displaying spatiotemporal chaos, as exemplified here by
forecasting the evolution of the onedimensional complex Ginzburg-Landau
equation in a finite domain.Comment: 4 pages, 5 figure
Modeling the Residential Infiltration of Outdoor PM2.5 in the Multi-Ethnic Study of Atherosclerosis and Air Pollution (MESA Air)
Background: Epidemiologic studies of fine particulate matter [aerodynamic diameter ≤ 2.5 μm (PM2.5)] typically use outdoor concentrations as exposure surrogates. Failure to account for variation in residential infiltration efficiencies (Finf) will affect epidemiologic study results
- …