There is growing evidence in the epidemiologic literature of the relationship
between air pollution and adverse health outcomes. Prediction of individual air
pollution exposure in the Environmental Protection Agency (EPA) funded
Multi-Ethnic Study of Atheroscelerosis and Air Pollution (MESA Air) study
relies on a flexible spatio-temporal prediction model that integrates land-use
regression with kriging to account for spatial dependence in pollutant
concentrations. Temporal variability is captured using temporal trends
estimated via modified singular value decomposition and temporally varying
spatial residuals. This model utilizes monitoring data from existing regulatory
networks and supplementary MESA Air monitoring data to predict concentrations
for individual cohort members. In general, spatio-temporal models are limited
in their efficacy for large data sets due to computational intractability. We
develop reduced-rank versions of the MESA Air spatio-temporal model. To do so,
we apply low-rank kriging to account for spatial variation in the mean process
and discuss the limitations of this approach. As an alternative, we represent
spatial variation using thin plate regression splines. We compare the
performance of the outlined models using EPA and MESA Air monitoring data for
predicting concentrations of oxides of nitrogen (NOx)-a pollutant of primary
interest in MESA Air-in the Los Angeles metropolitan area via cross-validated
R2. Our findings suggest that use of reduced-rank models can improve
computational efficiency in certain cases. Low-rank kriging and thin plate
regression splines were competitive across the formulations considered,
although TPRS appeared to be more robust in some settings.Comment: Published in at http://dx.doi.org/10.1214/14-AOAS786 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org