34 research outputs found

    Concentrations of criteria pollutants in the contiguous U.S., 1979 – 2015: Role of model parsimony in integrated empirical geographic regression

    Get PDF
    BACKGROUND: National- or regional-scale prediction models that estimate individual-level air pollution concentrations commonly include hundreds of geographic variables. However, these many variables may not be necessary and parsimonious approach including small numbers of variables may achieve sufficient prediction ability. This parsimonious approach can also be applied to most criteria pollutants. This approach will be powerful when generating publicly available datasets of model predictions that support research in environmental health and other fields. OBJECTIVES: We aim to (1) build annual-average integrated empirical geographic (IEG) regression models for the contiguous U.S. for six criteria pollutants, for all years with regulatory monitoring data during 1979 – 2015; (2) explore the impact of model parsimony on model performance by comparing the model performance depending on the numbers or variables offered into a model; and (3) provide publicly available model predictions. METHODS: We compute annual-average concentrations from regulatory monitoring data for PM10, PM2.5, NO2, SO2, CO, and ozone at all monitoring sites for 1979-2015. We also compute ~900 geographic characteristics at each location including measures of traffic, land use, and satellite-based estimates of air pollution and landcover. We then develop IEG models, employing universal kriging and summary factors estimated by partial least squares (PLS) of independent variables. For all pollutants and years, we compare three approaches for choosing variables to include in the model: (1) no variables (kriging only), (2) a limited number of variables chosen by forward selection, and (3) all variables. We evaluate model performance using 10-fold cross-validation (CV) using conventional randomly-selected and spatially-clustered test data. RESULTS: Models using 3 to 30 variables generally have the best performance across all pollutants and years (median R2 conventional [clustered] CV: 0.66 [0.47]) compared to models with no (0.37 [0]) or all variables (0.64 [0.27]). Using the best models mostly including 3-30 variables, we predicted annual-average concentrations of six criteria pollutants for all Census Blocks in the contiguous U.S. DISCUSSION: Our findings suggest that national prediction models can be built on only a small number (30 or fewer) of important variables and provide robust concentration estimates. Model estimates are freely available online

    Development of West-European PM2.5 and NO2 land use regression models incorporating satellite-derived and chemical transport modelling data

    Get PDF
    Satellite-derived (SAT) and chemical transport model (CTM) estimates of PM2.5 and NO2 are increasingly used in combination with Land Use Regression (LUR) models. We aimed to compare the contribution of SAT and CTM data to the performance of LUR PM2.5 and NO2 models for Europe. Four sets of models, all including local traffic and land use variables, were compared (LUR without SAT or CTM, with SAT only, with CTM only, and with both SAT and CTM). LUR models were developed using two monitoring data sets: PM2.5 and NO2 ground level measurements from the European Study of Cohorts for Air Pollution Effects (ESCAPE) and from the European AIRBASE network. LUR PM2.5 models including SAT and SAT+CTM explained ~60% of spatial variation in measured PM2.5 concentrations, substantially more than the LUR model without SAT and CTM (adjR(2): 0.33-0.38). For NO2 CTM improved prediction modestly (adjR(2): 0.58) compared to models without SAT and CTM (adjR(2): 0.47-0.51). Both monitoring networks are capable of producing models explaining the spatial variance over a large study area. SAT and CTM estimates of PM2.5 and NO2 significantly improved the performance of high spatial resolution LUR models at the European scale for use in large epidemiological studies

    Using Remote Sensing to Understand Urban Air Quality Exposures and Inequities

    No full text
    Thesis (Ph.D.)--University of Washington, 2021Outdoor air pollution is one of the leading causes of morbidity and mortality in the United States and around the world, but these impacts are not distributed equally. Countries, communities, and households that are socially and economically deprived often experience higher levels of air pollution. Yet too often these locations remain unmonitored or insufficiently monitored by traditional ground-based measurements. In this dissertation I employ satellite-based remote sensing of nitrogen dioxide (NO2), a major contributor to urban air pollution and a proxy for a toxic mix of pollutants associated with traffic and combustion emissions, to explore air pollution levels globally and within the US. Within the last two decades, satellite air pollution measurements have considerably expanded the capability to measure air pollution in previously unmonitored locations and across administrative boundaries. Cities serve as focal points, concentrating social and economic opportunities, but may also concentrate hazards, including air pollution. Strategic, compact urban design may be a way to improve a cities air quality, yet global empirical evidence has historically been limited by data availability and consistency. Here I use satellite-based measurements of NO2 and built-up land area to explore the relationship between city-wide NO2 levels and urban form characteristics (i.e., contiguity, circularity, percent impervious surfaces, percent vegetation coverage) for a global sample of 1,274 cities. Three of the urban form metrics (contiguity, circularity, and vegetation) have a small, but statistically significant relationship with city NO2 levels; however, the combined effect of these three attributes could be sizeable. For example, a city at the 75th percentile for all three metrics could accommodate, on average, twice the population as a city at the 25th percentile, while maintaining similar air quality. This work also shows that country level factors such as economic conditions and environmental policies may impact the urban form – air pollution relationships. Moreover, the impact of urban form on air quality may be larger for small cities, an important finding given the large portion of current and projected future population that lives in small cities. Satellite air pollution measurements are limited by their spatial resolution. For example, they are well suited for exploring NO¬2 levels between cities, as described above; however, alone they typically cannot capture the fine-scale spatial variability needed to characterize population exposure to air pollution. Satellite-based empirical models combine the regional concentrations from satellite measurements with ground-based measurements and local land use and land cover information to predict air pollution concentrations with high spatial resolution (typically 1 km or less). These models have become ubiquitous, yet few studies have investigated how satellite and other regional air pollution covariates impact these models. In this dissertation, I address this gap by exploring the effect of several regional NO2 covariates in an empirical model for annual average NO2 over the contiguous US and find that inclusion of a regional covariate improves model predictive power, yet choice of covariate has limited impact. Additionally, empirical models can be data and computationally intensive, and are often limited to long-term averages and a small number of years. Here, I address these issues by developing a straightforward and easy to implement spatiotemporal scaling technique to extend the temporal coverage of a year-2006 annual NO2 model to over a decade (2000-2010) of monthly NO2 estimates. The resulting estimates are data publicly available online. The spatiotemporal scaling technique and these data have since been used in several publications exploring health effects and residential exposure disparities associated with outdoor NO2 levels. Residential air pollution disparities in the contiguous US have become a topic of recent interest. Children are a particularly vulnerable population and disparities in their air pollution exposure could have lasting impacts. Despite this, little has been done to track outdoor air pollution levels at schools throughout the US. In this dissertation, I add to this body of work by exploring a criteria pollutant, NO2, and by considering home and school locations to better understand the role of public schools in students’ total exposure. I find that, on average, racial and ethnic minority students live in and attend schools in areas with higher NO2 levels than their non-Hispanic, white peers, and that impoverished students (defined here as those eligible for school lunch programs) attend, on average, schools with higher NO2 levels than their non-impoverished peers. Minority students are much more likely than their white peers to live in areas above the World Health Organization’s annual outdoor NO2 guideline, and this likelihood is larger at schools than at home locations, particularly when comparing predominately minority schools to predominately white schools. This finding -- that public schools may exacerbate disparities -- has important implications for addressing childhood inequities. Notably, strategies that do not address school exposure inequities may fail to address overall exposure inequities. Moreover, strategies to reduce school segregation or to identify and mitigate NO2 levels at the most at-risk schools could have a significant impact on children’s overall NO2 inequities. This work also shows that race and income are intertwined; independently, more impoverished schools and schools with more minority students tend to be in areas with higher NO2 levels than more well-off schools and schools with fewer minority students. Schools in large urban areas exhibit disparities by race/ethnicity alone, even when controlling for school-level income. This work highlights NO2 disparities at public schools throughout the contiguous US. Those national disparities are driven largely by disparities in the 50 largest urban areas, which provides motivation for additional exploration and tracking of air pollution levels at these locations. In summary, in this dissertation I have demonstrated how satellite measurements and empirical models that incorporate satellite measurements vastly improve the capability of uncovering and monitoring air pollution exposure disparities for a global and US-wide analysis. Recently launched and soon to be launched satellite-borne sensors promise higher spatial and temporal resolution air pollution measurements. Those measurements will allow for better understanding of concentrations and emission sources, as well as improve satellite-based empirical models, facilitating further tracking and characterization of exposures and exposure disparities from global to local scales

    A national satellite-based land-use regression model for air pollution exposure assessment in Australia

    Get PDF
    Land-use regression (LUR) is a technique that can improve the accuracy of air pollution exposure assessment in epidemiological studies. Most LUR models are developed for single cities, which places limitations on their applicability to other locations. We sought to develop a model to predict nitrogen dioxide (NO2) concentrations with national coverage of Australia by using satellite observations of tropospheric NO2 columns combined with other predictor variables. We used a generalised estimating equation (GEE) model to predict annual and monthly average ambient NO2 concentrations measured by a national monitoring network from 2006 through 2011. The best annual model explained 81% of spatial variation in NO2 (absolute RMS error=1.4 ppb), while the best monthly model explained 76% (absolute RMS error=1.9 ppb). We applied our models to predict NO2 concentrations at the ~350,000 census mesh blocks across the country (a mesh block is the smallest spatial unit in the Australian census). National population-weighted average concentrations ranged from 7.3 ppb (2006) to 6.3 ppb (2011). We found that a simple approach using tropospheric NO2 column data yielded models with slightly better predictive ability than those produced using a more involved approach that required simulation of surface-to-column ratios. The models were capable of capturing within-urban variability in NO2, and offer the ability to estimate ambient NO2 concentrations at monthly and annual time scales across Australia from 2006–2011. We are making our model predictions freely available for research

    National Spatiotemporal Exposure Surface for NO<sub>2</sub>: Monthly Scaling of a Satellite-Derived Land-Use Regression, 2000–2010

    No full text
    Land-use regression (LUR) is widely used for estimating within-urban variability in air pollution. While LUR has recently been extended to national and continental scales, these models are typically for long-term averages. Here we present NO<sub>2</sub> surfaces for the continental United States with excellent spatial resolution (∼100 m) and monthly average concentrations for one decade. We investigate multiple potential data sources (e.g., satellite column and surface estimates, high- and standard-resolution satellite data, and a mechanistic model [WRF-Chem]), approaches to model building (e.g., one model for the whole country versus having separate models for urban and rural areas, monthly LURs versus temporal scaling of a spatial LUR), and spatial interpolation methods for temporal scaling factors (e.g., kriging versus inverse distance weighted). Our core approach uses NO<sub>2</sub> measurements from U.S. EPA monitors (2000–2010) to build a spatial LUR and to calculate spatially varying temporal scaling factors. The model captures 82% of the spatial and 76% of the temporal variability (population-weighted average) of monthly mean NO<sub>2</sub> concentrations from U.S. EPA monitors with low average bias (21%) and error (2.4 ppb). Model performance in absolute terms is similar near versus far from monitors, and in urban, suburban, and rural locations (mean absolute error 2–3 ppb); since low-density locations generally experience lower concentrations, model performance in relative terms is better near monitors than far from monitors (mean bias 3% versus 40%) and is better for urban and suburban locations (1–6%) than for rural locations (78%, reflecting the relatively clean conditions in many rural areas). During 2000–2010, population-weighted mean NO<sub>2</sub> exposure decreased 42% (1.0 ppb [∼5.2%] per year), from 23.2 ppb (year 2000) to 13.5 ppb (year 2010). We apply our approach to all U.S. Census blocks in the contiguous United States to provide 132 months of publicly available, high-resolution NO<sub>2</sub> concentration estimates

    Concentrations of criteria pollutants in the contiguous U.S., 1979 - 2015: Role of prediction model parsimony in integrated empirical geographic regression.

    No full text
    National-scale empirical models for air pollution can include hundreds of geographic variables. The impact of model parsimony (i.e., how model performance differs for a large versus small number of covariates) has not been systematically explored. We aim to (1) build annual-average integrated empirical geographic (IEG) regression models for the contiguous U.S. for six criteria pollutants during 1979-2015; (2) explore systematically the impact on model performance of the number of variables selected for inclusion in a model; and (3) provide publicly available model predictions. We compute annual-average concentrations from regulatory monitoring data for PM10, PM2.5, NO2, SO2, CO, and ozone at all monitoring sites for 1979-2015. We also use ~350 geographic characteristics at each location including measures of traffic, land use, land cover, and satellite-based estimates of air pollution. We then develop IEG models, employing universal kriging and summary factors estimated by partial least squares (PLS) of geographic variables. For all pollutants and years, we compare three approaches for choosing variables to include in the PLS model: (1) no variables, (2) a limited number of variables selected from the full set by forward selection, and (3) all variables. We evaluate model performance using 10-fold cross-validation (CV) using conventional and spatially-clustered test data. Models using 3 to 30 variables selected from the full set generally have the best performance across all pollutants and years (median R2 conventional [clustered] CV: 0.66 [0.47]) compared to models with no (0.37 [0]) or all variables (0.64 [0.27]). Concentration estimates for all Census Blocks reveal generally decreasing concentrations over several decades with local heterogeneity. Our findings suggest that national prediction models can be built by empirically selecting only a small number of important variables to provide robust concentration estimates. Model estimates are freely available online

    National empirical models of air pollution using microscale measures of the urban environment

    No full text
    National-scale empirical models of air pollution (e.g., Land Use Regression) rely on predictor variables (e.g., population density, land cover) at different geographic scales. These models typically lack microscale variables (e.g., street level), which may improve prediction with fine-spatial gradients. We developed microscale variables of the urban environment including Point of Interest (POI) data, Google Street View (GSV) imagery, and satellite-based measures of urban form. We developed United States national models for six criteria pollutants (NO2, PM2.5, O-3, CO, PM10, SO2) using various modeling approaches: Stepwise Regression + kriging (SW-K), Partial Least Squares + kriging (PLS-K), and Machine Learning + kriging (ML-K). We compared predictor variables (e.g., traditional vs microscale) and emerging modeling approaches (ML-K) to well-established approaches (i.e., traditional variables in a PLS-K or SW-K framework). We found that combined predictor variables (traditional + microscale) in the ML-K models outperformed the well-established approaches (10-fold spatial cross-validation (CV) R-2 increased 0.02-0.42 [average: 0.19] among six criteria pollutants). Comparing all model types using microscale variables to models with traditional variables, the performance is similar (average difference of 10-fold spatial CV R-2 = 0.05) suggesting microscale variables are a suitable substitute for traditional variables. ML-K and microscale variables show promise for improving national empirical models
    corecore