Integrating Augmented <i>In Situ</i> Measurements
and a Spatiotemporal Machine Learning Model To Back Extrapolate Historical
Particulate Matter Pollution over the United Kingdom: 1980–2019
Historical PM2.5 data are essential for assessing
the
health effects of air pollution exposure across the life course or
early life. However, a lack of high-quality data sources, such as
satellite-based aerosol optical depth before 2000, has resulted in
a gap in spatiotemporally resolved PM2.5 data for historical
periods. Taking the United Kingdom as an example, we leveraged the
light gradient boosting model to capture the spatiotemporal association
between PM2.5 concentrations and multi-source geospatial
predictors. Augmented PM2.5 from PM10 measurements
expanded the spatiotemporal representativeness of the ground measurements.
Observations before and after 2009 were used to train and test the
models, respectively. Our model showed fair prediction accuracy from
2010 to 2019 [the ranges of coefficients of determination (R2) for the grid-based cross-validation are 0.71–0.85]
and commendable back extrapolation performance from 1998 to 2009 (the
ranges of R2 for the independent external
testing are 0.32–0.65) at the daily level. The pollution episodes
in the 1980s and pollution levels in the 1990s were also reproduced
by our model. The 4-decade PM2.5 estimates demonstrated
that most regions in England witnessed significant downward trends
in PM2.5 pollution. The methods developed in this study
are generalizable to other data-rich regions for historical air pollution
exposure assessment