39 research outputs found
Physics-Informed Deep Learning to Reduce the Bias in Joint Prediction of Nitrogen Oxides
Atmospheric nitrogen oxides (NOx) primarily from fuel combustion have
recognized acute and chronic health and environmental effects. Machine learning
(ML) methods have significantly enhanced our capacity to predict NOx
concentrations at ground-level with high spatiotemporal resolution but may
suffer from high estimation bias since they lack physical and chemical
knowledge about air pollution dynamics. Chemical transport models (CTMs)
leverage this knowledge; however, accurate predictions of ground-level
concentrations typically necessitate extensive post-calibration. Here, we
present a physics-informed deep learning framework that encodes
advection-diffusion mechanisms and fluid dynamics constraints to jointly
predict NO2 and NOx and reduce ML model bias by 21-42%. Our approach captures
fine-scale transport of NO2 and NOx, generates robust spatial extrapolation,
and provides explicit uncertainty estimation. The framework fuses
knowledge-driven physicochemical principles of CTMs with the predictive power
of ML for air quality exposure, health, and policy applications. Our approach
offers significant improvements over purely data-driven ML methods and has
unprecedented bias reduction in joint NO2 and NOx prediction
Exposure measurement error in air pollution studies: the impact of shared, multiplicative measurement error on epidemiological health risk estimates
Spatiotemporal air pollution models are increasingly being used to estimate health effects in epidemiological studies. Although such exposure prediction models typically result in improved spatial and temporal resolution of air pollution predictions, they remain subject to shared measurement error, a type of measurement error common in spatiotemporal exposure models which occurs when measurement error is not independent of exposures. A fundamental challenge of exposure measurement error in air pollution assessment is the strong correlation and sometimes identical (shared) error of exposure estimates across geographic space and time. When exposure estimates with shared measurement error are used to estimate health risk in epidemiological analyses, complex errors are potentially introduced, resulting in biased epidemiological conclusions. We demonstrate the influence of using a three-stage spatiotemporal exposure prediction model and introduce formal methods of shared, multiplicative measurement error (SMME) correction of epidemiological health risk estimates. Using our three-stage, ensemble learning based nitrogen oxides (NOx) exposure prediction model, we quantified SMME. We conducted an epidemiological analysis of wheeze risk in relation to NOx exposure among school-aged children. To demonstrate the incremental influence of exposure modeling stage, we iteratively estimated the health risk using assigned exposure predictions from each stage of the NOx model. We then determined the impact of SMME on the variance of the health risk estimates under various scenarios. Depending on the stage of the spatiotemporal exposure model used, we found that wheeze odds ratio ranged from 1.16 to 1.28 for an interquartile range increase in NOx. With each additional stage of exposure modeling, the health effect estimate moved further away from the null (OR=1). When corrected for observed SMME, the health effects confidence intervals slightly lengthened, but our epidemiological conclusions were not altered. When the variance estimate was corrected for the potential "worst case scenario" of SMME, the standard error further increased, having a meaningful influence on epidemiological conclusions. Our framework can be expanded and used to understand the implications of using exposure predictions subject to shared measurement error in future health investigations
Recommended from our members
Spatiotemporal imputation of MAIAC AOD using deep learning with downscaling
Aerosols have adverse health effects and play a significant role in the climate as well. The Multiangle Implementation of Atmospheric Correction (MAIAC) provides Aerosol Optical Depth (AOD) at high temporal (daily) and spatial (1 km) resolution, making it particularly useful to infer and characterize spatiotemporal variability of aerosols at a fine spatial scale for exposure assessment and health studies. However, clouds and conditions of high surface reflectance result in a significant proportion of missing MAIAC AOD. To fill these gaps, we present an imputation approach using deep learning with downscaling. Using a baseline autoencoder, we leverage residual connections in deep neural networks to boost learning and parameter sharing to reduce overfitting, and conduct bagging to reduce error variance in the imputations. Downscaled through a similar auto-encoder based deep residual network, Modern-Era Retrospective analysis for Research and Applications Version 2 (MERRA-2) GMI Replay Simulation (M2GMI) data were introduced to the network as an important gap-filling feature that varies in space to be used for missingness imputations. Imputing weekly MAIAC AOD from 2000 to 2016 over California, a state with considerable geographic heterogeneity, our full (non-full) residual network achieved mean R2 = 0.94 (0.86) [RMSE = 0.007 (0.01)] in an independent test, showing considerably better performance than a regular neural network or non-linear generalized additive model (mean R2 = 0.78-0.81; mean RMSE = 0.013-0.015). The adjusted imputed as well as combined imputed and observed MAIAC AOD showed strong correlation with Aerosol Robotic Network (AERONET) AOD (R = 0.83; R2 = 0.69, RMSE = 0.04). Our results show that we can generate reliable imputations of missing AOD through a deep learning approach, having important downstream air quality modeling applications
Recommended from our members
Longitudinal associations of in utero and early life near-roadway air pollution with trajectories of childhood body mass index
Abstract Background Evidence suggests that childhood near-roadway air pollution (NRAP) exposures contribute to increased body mass index (BMI); however, effects of NRAP exposure during the vulnerable periods including in utero and first year of life have yet to be established. In this study, we examined whether exposure to elevated concentrations of NRAP during in utero and/or first year of life increase childhood BMI growth. Methods Participants in the Children’s Health Study enrolled from 2002 to 2003 with annual visits over a four-year period and who changed residences before study entry were included (n = 2318). Annual height and weight were measured and lifetime residential NRAP exposures including in utero and first year of life periods were estimated by nitrogen oxides (NOx) using the California line-source dispersion model. Linear mixed effects models assessed in utero or first year near-road freeway and non-freeway NOx exposures and BMI growth after adjusting for age, sex, race/ethnicity, parental education, Spanish questionnaire, and later childhood near-road NOx exposure. Results A two-standard deviation difference in first year of life near-road freeway NOx exposure was associated with a 0.1 kg/m2 (95% confidence interval (CI): 0.03, 0.2) faster increase in BMI growth per year and a 0.5 kg/m2 (95% CI: 0.02, 0.9) higher attained BMI at age 10 years. Conclusions Higher exposure to early life NRAP increased the rate of change of childhood BMI and resulted in a higher attained BMI at age 10 years that were independent of later childhood exposures. These findings suggest that elevated early life NRAP exposures contribute to increased obesity risk in children
Recommended from our members
Constrained Mixed-Effect Models with Ensemble Learning for Prediction of Nitrogen Oxides Concentrations at High Spatiotemporal Resolution.
Spatiotemporal models to estimate ambient exposures at high spatiotemporal resolutions are crucial in large-scale air pollution epidemiological studies that follow participants over extended periods. Previous models typically rely on central-site monitoring data and/or covered short periods, limiting their applications to long-term cohort studies. Here we developed a spatiotemporal model that can reliably predict nitrogen oxide concentrations with a high spatiotemporal resolution over a long time span (>20 years). Leveraging the spatially extensive highly clustered exposure data from short-term measurement campaigns across 1-2 years and long-term central site monitoring in 1992-2013, we developed an integrated mixed-effect model with uncertainty estimates. Our statistical model incorporated nonlinear and spatial effects to reduce bias. Identified important predictors included temporal basis predictors, traffic indicators, population density, and subcounty-level mean pollutant concentrations. Substantial spatial autocorrelation (11-13%) was observed between neighboring communities. Ensemble learning and constrained optimization were used to enhance reliability of estimation over a large metropolitan area and a long period. The ensemble predictions of biweekly concentrations resulted in an R2 of 0.85 (RMSE: 4.7 ppb) for NO2 and 0.86 (RMSE: 13.4 ppb) for NOx. Ensemble learning and constrained optimization generated stable time series, which notably improved the results compared with those from initial mixed-effects models
Recommended from our members
Exposure measurement error in air pollution studies: A framework for assessing shared, multiplicative measurement error in ensemble learning estimates of nitrogen oxides.
BackgroundIncreasingly ensemble learning-based spatiotemporal models are being used to estimate residential air pollution exposures in epidemiological studies. While these machine learning models typically have improved performance, they suffer from exposure measurement error that is inherent in all models. Our objective is to develop a framework to formally assess shared, multiplicative measurement error (SMME) in our previously published three-stage, ensemble learning-based nitrogen oxides (NOx) model to identify its spatial and temporal patterns and predictors.MethodsBy treating the ensembles as an external dosimetry system, we quantified shared and unshared, multiplicative and additive (SUMA) measurement error components in our exposure model. We used generalized additive models (GAMs) with a smooth term for location to identify geographic locations with significantly elevated SMME and explain their spatial and temporal determinants.ResultsWe found evidence of significant shared and unshared multiplicative error (p < 0.0001) in our ensemble-learning based spatiotemporal NOx model predictions. Unshared multiplicative error was 26 times larger than SMME. We observed significant geographic (p < 0.0001) and temporal variation in SMME with the majority (43%) of predictions with elevated SMME occurring in the earliest time-period (1992-2000). Densely populated urban prediction regions with complex air pollution sources generally exhibited highest odds of elevated SMME.ConclusionsWe developed a novel statistical framework to formally evaluate the magnitude and drivers of SMME in ensemble learning-based exposure models. Our framework can be used to inform building future improved exposure models
Recommended from our members
Cluster-based bagging of constrained mixed-effects models for high spatiotemporal resolution nitrogen oxides prediction over large regions
BackgroundAccurate estimation of nitrogen dioxide (NO2) and nitrogen oxide (NOx) concentrations at high spatiotemporal resolutions is crucial for improving evaluation of their health effects, particularly with respect to short-term exposures and acute health outcomes. For estimation over large regions like California, high spatial density field campaign measurements can be combined with more sparse routine monitoring network measurements to capture spatiotemporal variability of NO2 and NOx concentrations. However, monitors in spatially dense field sampling are often highly clustered and their uneven distribution creates a challenge for such combined use. Furthermore, heterogeneities due to seasonal patterns of meteorology and source mixtures between sub-regions (e.g. southern vs. northern California) need to be addressed.ObjectivesIn this study, we aim to develop highly accurate and adaptive machine learning models to predict high-resolution NO2 and NOx concentrations over large geographic regions using measurements from different sources that contain samples with heterogeneous spatiotemporal distributions and clustering patterns.MethodsWe used a comprehensive Kruskal-K-means method to cluster the measurement samples from multiple heterogeneous sources. Spatiotemporal cluster-based bootstrap aggregating (bagging) of the base mixed-effects models was then applied, leveraging the clusters to obtain balanced and less correlated training samples for less bias and improvement in generalization. Further, we used the machine learning technique of grid search to find the optimal interaction of temporal basis functions and the scale of spatial effects, which, together with spatiotemporal covariates, adequately captured spatiotemporal variability in NO2 and NOx at the state and local levels.ResultsWe found an optimal combination of four temporal basis functions and 200 m scale spatial effects for the base mixed-effects models. With the cluster-based bagging of the base models, we obtained robust predictions with an ensemble cross validation R2 of 0.88 for both NO2 and NOx [RMSE (RMSEIQR): 3.62 ppb (0.28) and 9.63 ppb (0.37) respectively]. In independent tests of random sampling, our models achieved similarly strong performance (R2 of 0.87-0.90; RMSE of 3.97-9.69 ppb; RMSEIQR of 0.21-0.27), illustrating minimal over-fitting.ConclusionsOur approach has important implications for fusing data from highly clustered and heterogeneous measurement samples from multiple data sources to produce highly accurate concentration estimates of air pollutants such as NO2 and NOx at high resolution over a large region
W-TSS: A Wavelet-Based Algorithm for Discovering Time Series Shapelets
Many approaches to time series classification rely on machine learning methods. However, there is growing interest in going beyond black box prediction models to understand discriminatory features of the time series and their associations with outcomes. One promising method is time-series shapelets (TSS), which identifies maximally discriminative subsequences of time series. For example, in environmental health applications TSS could be used to identify short-term patterns in exposure time series (shapelets) associated with adverse health outcomes. Identification of candidate shapelets in TSS is computationally intensive. The original TSS algorithm used exhaustive search. Subsequent algorithms introduced efficiencies by trimming/aggregating the set of candidates or training candidates from initialized values, but these approaches have limitations. In this paper, we introduce Wavelet-TSS (W-TSS) a novel intelligent method for identifying candidate shapelets in TSS using wavelet transformation discovery. We tested W-TSS on two datasets: (1) a synthetic example used in previous TSS studies and (2) a panel study relating exposures from residential air pollution sensors to symptoms in participants with asthma. Compared to previous TSS algorithms, W-TSS was more computationally efficient, more accurate, and was able to discover more discriminative shapelets. W-TSS does not require pre-specification of shapelet length