33 research outputs found

    Current approaches used in epidemiologic studies to examine short-term multipollutant air pollution exposures

    Get PDF
    Air pollution epidemiology traditionally focuses on the relationship between individual air pollutants and health outcomes (e.g., mortality). To account for potential copollutant confounding, individual pollutant associations are often estimated by adjusting or controlling for other pollutants in the mixture. Recently, the need to characterize the relationship between health outcomes and the larger multipollutant mixture has been emphasized in an attempt to better protect public health and inform more sustainable air quality management decisions

    Source-specific pollution exposure and associations with pulmonary response in the Atlanta Commuters Exposure Studies

    Get PDF
    Concentrations of traffic-related air pollutants are frequently higher within commuting vehicles than in ambient air. Pollutants found within vehicles may include those generated by tailpipe exhaust, brake wear, and road dust sources, as well as pollutants from in-cabin sources. Sourcespecific pollution, compared to total pollution, may represent regulation targets that can better protect human health. We estimated source-specific pollution exposures and corresponding pulmonary response in a panel study of commuters. We used constrained positive matrix factorization to estimate source-specific pollution factors and, subsequently, mixed effects models to estimate associations between source-specific pollution and pulmonary response. We identified four pollution factors that we named: crustal, primary tailpipe traffic, non-tailpipe traffic, and secondary. Among asthmatic subjects (N=48), interquartile range increases in crustal and secondary pollution were associated with changes in lung function of โˆ’1.33% (95% confidence interval (CI): โˆ’2.45, โˆ’0.22) and โˆ’2.19% (95% CI: โˆ’3.46, โˆ’0.92) relative to baseline, respectively. Among non-asthmatic subjects (N=51), non-tailpipe pollution was associated with pulmonary response only at 2.5 hours post-commute. We found no significant associations between pulmonary response and primary tailpipe pollution. Health effects associated with traffic-related pollution may vary by source, and therefore some traffic pollution sources may require targeted interventions to protect healt

    Statistical strategies for constructing health risk models with multiple pollutants and their interactions: possible choices and comparisons

    Full text link
    Abstract Background As public awareness of consequences of environmental exposures has grown, estimating the adverse health effects due to simultaneous exposure to multiple pollutants is an important topic to explore. The challenges of evaluating the health impacts of environmental factors in a multipollutant model include, but are not limited to: identification of the most critical components of the pollutant mixture, examination of potential interaction effects, and attribution of health effects to individual pollutants in the presence of multicollinearity. Methods In this paper, we reviewed five methods available in the statistical literature that are potentially helpful for constructing multipollutant models. We conducted a simulation study and presented two data examples to assess the performance of these methods on feature selection, effect estimation and interaction identification using both cross-sectional and time-series designs. We also proposed and evaluated a two-step strategy employing an initial screening by a tree-based method followed by further dimension reduction/variable selection by the aforementioned five approaches at the second step. Results Among the five methods, least absolute shrinkage and selection operator regression performs well in general for identifying important exposures, but will yield biased estimates and slightly larger model dimension given many correlated candidate exposures and modest sample size. Bayesian model averaging, and supervised principal component analysis are also useful in variable selection when there is a moderately strong exposure-response association. Substantial improvements on reducing model dimension and identifying important variables have been observed for all the five statistical methods using the two-step modeling strategy when the number of candidate variables is large. Conclusions There is no uniform dominance of one method across all simulation scenarios and all criteria. The performances differ according to the nature of the response variable, the sample size, the number of pollutants involved, and the strength of exposure-response association/interaction. However, the two-step modeling strategy proposed here is potentially applicable under a multipollutant framework with many covariates by taking advantage of both the screening feature of an initial tree-based method and dimension reduction/variable selection property of the subsequent method. The choice of the method should also depend on the goal of the study: risk prediction, effect estimation or screening for important predictors and their interactions.http://deepblue.lib.umich.edu/bitstream/2027.42/112386/1/12940_2013_Article_691.pd

    HUMAN EXPOSURE TO AIR TOXICS IN URBAN ENVIRONMENTS: HEALTH RISKS, SOCIODEMOGRAPHIC DISPARITIES, AND MIXTURE PROFILES

    Get PDF
    Exposure to air toxics in urban environments may be of significant health concern because populations and emission sources are concentrated in the same geographic area. The overall objective of this study is to characterize the sources, variations, and mixture profiles of ambient air toxics in urban environments, and examine the sociodemographic disparities in exposures to air toxics in a typical U.S. metropolitan area.A model-to-monitor comparison was performed to evaluate the validity of modeling air toxics data using national datasets. Modeled concentrations in the 2011 National-scale Air Toxics Assessment (NATA) moderately agreed with monitoring measurements, and a sizable portion showed underestimation. Results warranted the need for actual monitoring data to conduct air toxics exposure assessment.Air toxics samples were collected in 106 census tracts in the Memphis area in 2014, and samples were analyzed for 71 volatile organic compounds (VOCs). Ambient VOC levels in Memphis were generally higher than the national averages in urban settings, but were mostly below the reference concentrations (RfCs). Factor analysis identified 5 major sources: manufacturing processes, vehicle exhaust, industrial solvents, refrigerants, and gasoline additives. The major non-cancer risks were from neurological, respiratory, and reproductive/developmental effects. The cumulative cancer risk was 5.9 3.3 10-4, with naphthalene and benzyl chloride as risk drivers.Sociodemographic disparities in cancer risks were examined by regressing cancer risks against socioeconomic, racial, and spatial parameters at the census tract level. We conducted separate disparity analyses using modeling data from 2011 NATA and our air toxics monitoring data. Analysis using modeling data showed strong sociodemographic disparities but that using monitoring data did not show. The discrepancy brought cautions for use of modeling air pollution data in environmental disparity research.We further assessed exposure to VOCs mixtures in five typical microenvironments (MEs): home, office, vehicle cabin, gas station, and community outdoors. The multivariate analysis of variance and pairwise analysis showed VOC profiles were distinguishingly different among MEs. The classification of profiles was achieved using the random forest. We anticipate wide applications of exposure profiles in epidemiologic research of exposure to air toxic mixtures

    Enhancing models and measurements of traffic-related air pollutants for health studies using dispersion modeling and Bayesian data fusion

    Get PDF
    Research Report 202 describes a study led by Dr. Stuart Batterman at the University of Michigan, Ann Arbor and colleagues. The investigators evaluated the ability to predict traffic-related air pollution using a variety of methods and models, including a line source air pollution dispersion model and sophisticated spatiotemporal Bayesian data fusion methods. Exposure assessment for traffic-related air pollution is challenging because the pollutants are a complex mixture and vary greatly over space and time. Because extensive direct monitoring is difficult and expensive, a number of modeling approaches have been developed, but each model has its own limitations and errors. Dr. Batterman and colleagues sought to improve model estimations by applying and systematically comparing the performance of different statistical models. The study made extensive use of data collected in the Near-road EXposures and effects of Urban air pollutants Study (NEXUS), a cohort study designed to examine the relationship between near-roadway pollutant exposures and respiratory outcomes in children with asthma who live close to major roadways in Detroit, Michigan

    Statistical methods for linking the chemical composition of particulate matter to health outcomes

    Get PDF
    Short-term exposure to particulate matter air pollution less than 2.5 micrometers in aerodynamic diameter (PM2.5) has been associated with mortality and morbidity in epidemiologic studies. PM2.5 is a complex mixture of many different chemicals that are emitted from both natural and anthropogenic sources. Recent studies have found that the toxicity of PM2.5 depends on its chemical composition, however estimating ambient PM2.5 constituent concentrations is challenging in part for several reasons. First, the network of ambient monitors that measure PM2.5 chemical constituents is spatially sparse and many communities have only one monitor. Additionally, some constituents of PM2.5 are spatially heterogeneous and their concentrations observed at ambient monitors may not be representative of surrounding areas. Last, PM2.5 constituents that contribute minimally to PM2.5 total mass frequently have censored concentrations that fall below minimum detection limits. Together, these attributes of the available data make estimating ambient PM2.5 constituent concentrations difficult and complicate subsequent health effects analyses. We could also estimate health effects of PM2.5 sources, which emit combinations of chemical constituents. PM2.5 sources are not directly measured and are frequently inferred from PM2.5 constituent concentrations. The challenge of estimating PM2.5 sources is magnified by the measurement error and data limitations of observed PM2.5 constituent concentrations. This dissertation developed methods to address measurement issues in estimating ambient concentrations of PM2.5 constituents and PM2.5 sources. Then, these methods were used to estimate associations between mortality and short-term exposure to PM2.5 constituents and PM2.5 sources

    ๊ธฐ๊ณ„ํ•™์Šต๊ณผ ์ˆ˜์šฉ๋ชจ๋ธ์„ ์ด์šฉํ•œ ์ดˆ๋ฏธ์„ธ๋จผ์ง€ ์˜ค์—ผ์› ๋ฐ ๊ธฐ์—ฌ๋„์˜ ์‹œ๊ณต๊ฐ„ ๋ถ„ํฌ ๋ถ„์„

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ(๋ฐ•์‚ฌ) -- ์„œ์šธ๋Œ€ํ•™๊ต๋Œ€ํ•™์› : ๊ณต๊ณผ๋Œ€ํ•™ ๊ฑด์„คํ™˜๊ฒฝ๊ณตํ•™๋ถ€, 2023. 2. ๊น€์žฌ์˜.์ง๊ฒฝ 2.5 ยตm ์ดํ•˜์˜ ์ž…์ž์ƒ ๋ฌผ์งˆ์ธ ์ดˆ๋ฏธ์„ธ๋จผ์ง€๋Š” ๋Œ€๊ธฐ์ค‘์— ์กด์žฌํ•˜๋ฉฐ, ๊ฑด๊ฐ•์— ๋ฏธ์น˜๋Š” ์•…์˜ํ–ฅ์œผ๋กœ ์ธํ•ด ์ˆ˜์‹ญ ๋…„ ๋™์•ˆ ์„ธ๊ณ„์ ์œผ๋กœ ๊ด€์‹ฌ์˜ ๋Œ€์ƒ์ด ๋˜๊ณ  ์žˆ๋Š” ๋Œ€๊ธฐ์˜ค์—ผ๋ฌผ์งˆ์ด๋‹ค. ์ดˆ๋ฏธ์„ธ๋จผ์ง€๋ฅผ ํšจ๊ณผ์ ์œผ๋กœ ๊ด€๋ฆฌํ•˜๊ธฐ ์œ„ํ•ด์„œ๋Š” ๋‹ค์–‘ํ•œ ์‹œ๊ฐ„๊ณผ ๊ณต๊ฐ„์— ๋Œ€ํ•ด ์ดˆ๋ฏธ์„ธ๋จผ์ง€์˜ ์˜ค์—ผ์› ์œ ํ˜•์„ ํŒŒ์•…ํ•˜๊ณ , ๊ฐ ์œ ํ˜•๋ณ„ ๊ธฐ์—ฌ๋„๋ฅผ ์ •๋Ÿ‰ํ™”ํ•˜๋Š” ๊ฒƒ์ด ์ค‘์š”ํ•˜๋‹ค. ๋”ฐ๋ผ์„œ, ์ดˆ๋ฏธ์„ธ๋จผ์ง€์˜ ์˜ค์—ผ์› ์ถ”์ •์€ ํ•ต์‹ฌ ๊ณผ์ œ๋กœ ๋‹ค๋ค„์ ธ ์™”์œผ๋ฉฐ, ํ†ต๊ณ„ํ•™์  ๋ฐฉ๋ฒ•๋ก ์„ ์ ์šฉํ•ด ์˜ค์—ผ์›์„ ์ถ”์ •ํ•˜๋Š” ์ˆ˜์šฉ๋ชจ๋ธ์ด ๋งŽ์ด ํ™œ์šฉ๋˜๊ณ  ์žˆ๋‹ค. ๋ณธ ์—ฐ๊ตฌ์—์„œ๋Š” ์ดˆ๋ฏธ์„ธ๋จผ์ง€์˜ ์„ธ๋ถ€ ํŠน์„ฑ์„ ํŒŒ์•…ํ•˜๊ธฐ ์œ„ํ•ด ์˜ค์—ผ์› ์ถ”์ •๊ณผ ์ถ”์ •๋œ ์˜ค์—ผ์›์˜ ์‹œ๊ณต๊ฐ„ ๋ถ„์„์„ ์ˆ˜ํ–‰ํ•˜์˜€์œผ๋ฉฐ, ์ด๋ฅผ ํ†ตํ•ด ํšจ๊ณผ์ ์ธ ์ดˆ๋ฏธ์„ธ๋จผ์ง€ ๊ด€๋ฆฌ ๋ฐฉ์•ˆ ๋งˆ๋ จ์— ์ค‘์š”ํ•œ ์ •๋ณด๋ฅผ ์ œ๊ณตํ•˜๋Š” ๊ฒƒ์„ ๋ชฉ์ ์œผ๋กœ ํ•˜์˜€๋‹ค. ์˜ค์—ผ์› ์œ ํ˜• ์ถ”์ • ์—ฐ๊ตฌ๋ฅผ ์œ„ํ•ด, ๋‘ ๊ฐ€์ง€ ๋ชจ๋ธ๋ง์ด ์ˆ˜ํ–‰๋˜์—ˆ๋‹ค. ์ฒซ๋ฒˆ์งธ๋Š” ์–‘ํ–‰๋ ฌ ์ธ์ž ๋ถ„์„(Positive matrix factorization, PMF) ๋ชจ๋ธ๋ง์œผ๋กœ, ์ด๋Š” ํ•œ ์žฅ์†Œ์—์„œ ์ดˆ๋ฏธ์„ธ๋จผ์ง€์˜ ์˜ค์—ผ์› ์œ ํ˜•์„ ๊ตฌ์ฒด์ ์œผ๋กœ ์ถ”์ •ํ•˜๊ธฐ ์œ„ํ•ด ํ™œ์šฉ๋˜์—ˆ๋‹ค. ๋‘๋ฒˆ์งธ๋Š” ๋ฒ ์ด์ง€์•ˆ ๋‹ค๋ณ€๋Ÿ‰ ์ˆ˜์šฉ ๋ชจ๋ธ๋ง(Bayesian spatial multivariate receptor modelingm, BSMRM)์œผ๋กœ, ์ด๋Š” ๋‹ค์ˆ˜์˜ ์ธก์ • ์ง€์ ์œผ๋กœ๋ถ€ํ„ฐ ๋„“์€ ๋ฒ”์œ„์˜ ๋ฉด์ ์— ๋Œ€ํ•ด ์ฃผ์š” ์˜ค์—ผ์› ์œ ํ˜•์„ ์ถ”์ •ํ•˜๊ธฐ ์œ„ํ•ด ํ™œ์šฉ๋˜์—ˆ๋‹ค. ๋˜ํ•œ, ๊ธฐ๊ณ„ํ•™์Šต ๋ชจ๋ธ๋“ค์„ ํ™œ์šฉํ•˜์—ฌ ์ดˆ๋ฏธ์„ธ๋จผ์ง€ ์˜ค์—ผ์› ์œ ํ˜• ์ถ”์ •์— ๊ฐ€์žฅ ์ค‘์š”ํ•œ ์ž๋ฃŒ๋กœ ํ™œ์šฉ๋˜๋Š” ์ดˆ๋ฏธ์„ธ๋จผ์ง€ ํ™”ํ•™์„ฑ๋ถ„ ๋†๋„๋ฅผ ์˜ˆ์ธกํ•˜์˜€๋‹ค. ๊ธฐ๊ณ„ํ•™์Šต ๋ชจ๋ธ์„ ์ดˆ๋ฏธ์„ธ๋จผ์ง€ ํ™”ํ•™์„ฑ๋ถ„ ์ž๋ฃŒ์— ๋Œ€ํ•ด ํ™œ์šฉ๊ฐ€๋Šฅํ•œ์ง€๋ฅผ ๊ฒ€ํ† ํ•˜์˜€๊ณ , ์ด๋ฅผ ํ†ตํ•ด ์ดˆ๋ฏธ์„ธ๋จผ์ง€ ํ™”ํ•™์„ฑ๋ถ„ ์ž๋ฃŒ์˜ ๋ฌด๊ฒฐ์„ฑ์„ ํ–ฅ์ƒ์‹œํ‚ค๊ณ ์ž ํ•˜์˜€๋‹ค. PMF ๋ชจ๋ธ๋ง์„ ํ†ตํ•ด, ๋Œ€ํ•œ๋ฏผ๊ตญ ์‹œํฅ์‹œ์˜ ์ดˆ๋ฏธ์„ธ๋จผ์ง€ ์˜ค์—ผ์› ์œ ํ˜• 10๊ฐ€์ง€๋ฅผ ๋„์ถœํ•˜์˜€๋‹ค. ์ด๋Š” ๊ฐ๊ฐ 2์ฐจ ์ƒ์„ฑ ์งˆ์‚ฐ์—ผ(24.3%), 2์ฐจ ์ƒ์„ฑ ํ™ฉ์‚ฐ์—ผ(18.8%), ์ด๋™ ์˜ค์—ผ์›(18.8%), ๋‚œ๋ฐฉ์—ฐ์†Œ(12.6%), ์ƒ๋ฌผ์ฒด ์—ฐ์†Œ(11.8%), ์„ํƒ„ ์—ฐ์†Œ(3.6%), ์ค‘์œ  ๊ด€๋ จ ์‚ฐ์—… ์˜ค์—ผ์›(1.8%), ์ œ๋ จ ๊ด€๋ จ ์‚ฐ์—… ์˜ค์—ผ์›(4.0%), ํ•ด์—ผ ์ž…์ž(2.7%), ํ† ์–‘(1.7%)์˜€๋‹ค. ๋„์ถœ๋œ ์˜ค์—ผ์› ์œ ํ˜•๋ณ„๋กœ, ์ดˆ๋ฏธ์„ธ๋จผ์ง€ ํ˜ธํก์— ๋”ฐ๋ฅธ ๊ฑด๊ฐ• ์˜ํ–ฅ์„ ํ‰๊ฐ€ํ•˜์˜€๋‹ค. ์„ํƒ„ ์—ฐ์†Œ, ์ค‘์œ  ๊ด€๋ จ ์‚ฐ์—… ์˜ค์—ผ์›, ์ด๋™ ์˜ค์—ผ์›์˜ ์ดˆ๋ฏธ์„ธ๋จผ์ง€ ๊ธฐ์—ฌ๋„๋Š” ๋‚ฎ์•˜์ง€๋งŒ, ์ด๋กœ ์ธํ•œ ๋ฐœ์•” ์œ„ํ•ด๋„๋Š” 10E-6 ์ด์ƒ์œผ๋กœ ๋‚˜ํƒ€๋‚ฌ๋‹ค. ๋”ฐ๋ผ์„œ, ์ดˆ๋ฏธ์„ธ๋จผ์ง€์˜ ์งˆ๋Ÿ‰๋†๋„ ๊ฐ์ถ• ์ค‘์‹ฌ์˜ ๋Œ€์‘๋งŒ์ด ์•„๋‹Œ, ์˜ค์—ผ์›๋ณ„ ๊ฑด๊ฐ•์˜ํ–ฅ ์ค‘์‹ฌ์˜ ๋Œ€์‘์ด ์š”๊ตฌ๋œ๋‹ค. ๊ธฐ๊ณ„ํ•™์Šต ๋ชจ๋ธ์˜ ์ดˆ๋ฏธ์„ธ๋จผ์ง€ ํ™”ํ•™์„ฑ๋ถ„ ์˜ˆ์ธก ๋Šฅ๋ ฅ์„ ํ‰๊ฐ€ํ•˜๊ธฐ ์œ„ํ•ด 4๊ฐ€์ง€ ๊ธฐ๊ณ„ํ•™์Šต ๋ชจ๋ธ์— ๋Œ€ํ•ด ์ž…๋ ฅ ์ž๋ฃŒ ์ˆ˜์ค€, ์˜ˆ์ธก ๋Œ€์ƒ ์„ฑ๋ถ„, ์ž…๋ ฅ ์ž๋ฃŒ ๊ธฐ๊ฐ„, ์ž…๋ ฅ ์ž๋ฃŒ์˜ ๊ฒฐ์ธก ๋น„์œจ, ์ž๋ฃŒ ๋Œ€์ƒ ์ง€์—ญ์„ ๋ณ€ํ™”ํ•˜๋ฉฐ ์˜ˆ์ธก ์ •ํ™•๋„๋ฅผ ๋น„๊ต ํ‰๊ฐ€ํ•˜์˜€๋‹ค. GAIN(Generative Adversarial Imputation Network), FCDNN(Fully Connected Deep Neural Network), Random forest(RF), kNN(k-nearest neighboring) ๋ชจ๋ธ์˜ 4๊ฐ€์ง€ ๊ธฐ๊ณ„ํ•™์Šต ๋ชจ๋ธ์„ ํ•œ๊ตญ์˜ 3๊ฐœ ์ง€์—ญ(์„œ์šธ, ์šธ์‚ฐ, ๋ฐฑ๋ น)์˜ 2016๋…„๋ถ€ํ„ฐ 2018๋…„๊นŒ์ง€์˜ ์ดˆ๋ฏธ์„ธ๋จผ์ง€ ํ™”ํ•™ ์„ฑ๋ถ„ ์ž๋ฃŒ์— ๋Œ€ํ•ด ์ ์šฉํ•˜์—ฌ ๋†๋„๋ฅผ ์˜ˆ์ธกํ•˜์˜€๋‹ค. ์˜ˆ์ธก๊ฐ’๊ณผ ๊ด€์ธก๊ฐ’ ์‚ฌ์ด์˜ ๊ฒฐ์ •๊ณ„์ˆ˜๋ฅผ ํ†ตํ•ด ์ •ํ™•๋„๋ฅผ ๋น„๊ตํ•œ ๊ฒฐ๊ณผ, ์˜ˆ์ธก ์ •ํ™•๋„๋Š” GAIN์ด ๊ฐ€์žฅ ๋†’์•˜๊ณ , FCDNN, RF ๋˜๋Š” kNN ์ˆœ์„œ๋กœ ๋‚˜ํƒ€๋‚ฌ๋‹ค. ์ž…๋ ฅ ์ž๋ฃŒ์˜ ๊ฒฐ์ธก๋ฅ ์ด 20%์—์„œ 80%๊นŒ์ง€ ์ฆ๊ฐ€ํ•จ์— ๋”ฐ๋ผ ์˜ˆ์ธก ์ •ํ™•๋„๋Š” ๋ชจ๋“  ๋ชจ๋ธ์—์„œ ๊ฐ์†Œํ•˜์˜€์œผ๋‚˜, ๋น„์ง€๋„ ๊ธฐ๊ณ„ํ•™์Šต ๋ชจ๋ธ์ธ GAIN๊ณผ kNN์—์„œ ๊ฐ์†Œ ํญ์ด ๋” ํฌ๊ฒŒ ๋‚˜ํƒ€๋‚ฌ๋‹ค. ์ž…๋ ฅ ์ž๋ฃŒ์˜ ๊ธฐ๊ฐ„์ด ๊ธธ์–ด์งˆ์ˆ˜๋ก, ๋”ฅ๋Ÿฌ๋‹ ๋ชจ๋ธ์ธ GAIN๊ณผ FCDNN์ด ๋‹ค๋ฅธ ๋‘ ๋ชจ๋ธ์ธ RF์™€ kNN๋ณด๋‹ค ์˜ˆ์ธก ์ •ํ™•๋„ ์ฆ๊ฐ€ ํญ์ด ๋” ์ปธ๋‹ค. ์˜ˆ์ธก ๋Œ€์ƒ ์ง€์—ญ๋ณ„๋กœ๋Š”, ์ž์ฒด ๋ฐฐ์ถœ์›์ด ๋งŽ์€ ์šธ์‚ฐ์˜ ๊ฒฝ์šฐ๊ฐ€ ์˜ˆ์ธก ์ •ํ™•๋„๊ฐ€ ๊ฐ€์žฅ ๋‚ฎ๊ฒŒ ๋‚˜ํƒ€๋‚ฌ๊ณ , ์ž์ฒด ๋ฐฐ์ถœ์›์˜ ์˜ํ–ฅ์ด ๊ฑฐ์˜ ์—†๋Š” ๋ฐฑ๋ น๋„์˜ ๊ฒฝ์šฐ ์˜ˆ์ธก ์ •ํ™•๋„๊ฐ€ ๊ฐ€์žฅ ๋†’๊ฒŒ ๋‚˜ํƒ€๋‚ฌ๋‹ค. ๋Œ€์ƒ ์„ฑ๋ถ„๋ณ„๋กœ๋Š” ์ด์˜จ ์„ฑ๋ถ„์ด ์˜ˆ์ธก ์ •ํ™•๋„๊ฐ€ ๋†’๊ฒŒ ๋‚˜ํƒ€๋‚ฌ๊ณ , ๋ฏธ๋Ÿ‰์›์†Œ ์„ฑ๋ถ„์€ ์˜ˆ์ธก ์ •ํ™•๋„๊ฐ€ ๋‚ฎ์•˜๋‹ค. ๋ณธ ์—ฐ๊ตฌ๋Š” ๊ธฐ๊ณ„ํ•™์Šต ๋ชจ๋ธ์˜ ์˜ˆ์ธก ์ •ํ™•๋„๋ฅผ ๋‹ค์–‘ํ•œ ์‹คํ—˜ ์กฐ๊ฑด์— ๋”ฐ๋ผ ํ‰๊ฐ€ํ•˜์—ฌ ๋Œ€๊ธฐ์˜ค์—ผ ๋ถ„์•ผ์—์„œ์˜ ๊ธฐ๊ณ„ํ•™์Šต ๋ชจ๋ธ์˜ ์ ์šฉ ๊ฐ€๋Šฅ์„ฑ์„ ํ‰๊ฐ€ํ–ˆ๋‹ค. ๋ฒ ์ด์ง€์•ˆ ๋‹ค๋ณ€๋Ÿ‰ ์ˆ˜์šฉ ๋ชจ๋ธ๋ง(BSMRM)์„ ํ†ตํ•ด์„œ๋Š” 8๊ฐœ์˜ ๊ด€์ธก ์ง€์  ์ž๋ฃŒ๋ฅผ ํ†ตํ•ด ์šฐ๋ฆฌ๋‚˜๋ผ์˜ ์ฃผ์š” ์ดˆ๋ฏธ์„ธ๋จผ์ง€ ์˜ค์—ผ์› 5๊ฐ€์ง€๋ฅผ ๋„์ถœํ•˜๊ณ , ๊ฐ๊ฐ ์˜ค์—ผ์› ์œ ํ˜•๋ณ„ ๊ธฐ์—ฌ๋„๋ฅผ ์šฐ๋ฆฌ๋‚˜๋ผ ์ „์ฒด์— ๋Œ€ํ•œ ๊ณต๊ฐ„ ๋ถ„ํฌ๋ฅผ ์ถ”์ •ํ•˜์˜€๋‹ค. 5๊ฐ€์ง€ ์˜ค์—ผ์›์€ ๊ฐ๊ฐ 2์ฐจ ์งˆ์‚ฐ์—ผ, 2์ฐจ ํ™ฉ์‚ฐ์—ผ, ์ž๋™์ฐจ ๋ฐฐ์ถœ, ์‚ฐ์—… ์˜ค์—ผ์›, ํ•ด์—ผ ์ž…์ž์˜€๋‹ค. ๊ฐ ์˜ค์—ผ์› ์œ ํ˜•๋ณ„ ์ผํ‰๊ท  ๊ธฐ์—ฌ๋„ ๋†๋„๋ฅผ ์ง€๋„์— ๊ณต๊ฐ„์ ์œผ๋กœ ํ‘œํ˜„ํ•  ์ˆ˜ ์žˆ์—ˆ๋‹ค. ๋˜ํ•œ, BSMRM์„ ํ†ตํ•ด ์˜ˆ์ธกํ•œ ์˜ค์—ผ์› ์œ ํ˜•๋ณ„ ๊ธฐ์—ฌ๋„์˜ ํƒ€๋‹น์„ฑ ๊ฒ€ํ† ๋ฅผ ์œ„ํ•ด ํ…Œ์ŠคํŠธ ์‚ฌ์ดํŠธ(์•ˆ์‚ฐ, ๋Œ€์ „, ๊ด‘์ฃผ)์˜ ์ž๋ฃŒ๋Š” ๊ฐ๊ฐ ์ œ์™ธ๋œ ๋ชจ๋ธ๋ง์„ ์ˆ˜ํ–‰ํ•˜์—ฌ ๊ฒฐ๊ณผ๋ฅผ ์„œ๋กœ ๋น„๊ตํ•˜์—ฌ ๋ชจ๋ธ์˜ ์ •ํ™•๋„๋ฅผ ํ™•์ธํ•˜์˜€๋‹ค. ์ด์ฒ˜๋Ÿผ ๊ณต๊ฐ„์ ์œผ๋กœ ์ถ”์ •๋œ ์˜ค์—ผ์› ์œ ํ˜• ๊ธฐ์—ฌ๋„๋Š” ์ดˆ๋ฏธ์„ธ๋จผ์ง€ ํ™”ํ•™์„ฑ๋ถ„์„ ์ธก์ •ํ•˜์ง€ ์•Š๋Š” ๋„์‹œ์—์„œ ์ดˆ๋ฏธ์„ธ๋จผ์ง€ ๋Œ€์‘ ๋ฐฉ์•ˆ์„ ์ˆ˜๋ฆฝํ•˜๋Š”๋ฐ ํฐ ๋„์›€์ด ๋  ์ˆ˜ ์žˆ๋‹ค. ์ฆ‰, 8๊ฐœ์˜ ์ธก์ • ์ž๋ฃŒ๋งŒ์œผ๋กœ ์šฐ๋ฆฌ๋‚˜๋ผ ์ „์ฒด์— ๋Œ€ํ•ด ์˜ˆ์ธกํ•œ ๊ฒฐ๊ณผ๋ฅผ ํ†ตํ•ด, ์ธก์ • ์ง€์ ์ด ์—†๋Š” ๋ชจ๋“  ๋„์‹œ์— ๋Œ€ํ•ด ์ถ”์ •์ด ๊ฐ€๋Šฅํ•˜์˜€์œผ๋ฉฐ, ์ด ๊ฒฐ๊ณผ๋Š” ๊ฑด๊ฐ• ์˜ํ–ฅ ํ‰๊ฐ€์™€ ๊ฐ™์€ ์ถ”๊ฐ€ ์—ฐ๊ตฌ์—๋„ ํ™œ์šฉ๋  ์ˆ˜ ์žˆ๋‹ค.Particulate matter less than 2.5 micrometers (PM2.5) has been a pollutant of interest globally for more than decades, owing to its adverse health effects. For developing effective PM2.5 management strategies, it is crucial to identify their sources and quantify how much they contribute to ambient PM2.5 concentrations in time and space. Source apportionment is the key to identifying the characteristics of PM2.5. Receptor modeling is widely used to identify PM2.5 sources as a statistical method of source apportionment. The chemical constituents of PM2.5 were used as input data for receptor modeling. Therefore, this study aimed to investigate the characteristics of PM2.5 using models of source apportionment and spatiotemporal analysis for effective management strategies. Two types of modeling were performed for the source apportionment study. The first is positive matrix factorization modeling, which identifies a specific source type and its contributions to PM2.5 from one site. The second is Bayesian spatial multivariate receptor modeling, which derives major sources and their contributions to PM2.5 from multiple monitoring sites. In addition, machine learning models were used to predict the concentrations of PM2.5, which are important data for receptor modeling. Machine learning models that can be used to increase data integrity and applicability to PM2.5 data were assessed. The sources of PM2.5 and their contributions in Siheung, South Korea, were identified using positive matrix factorization modeling. These 10 sources were secondary nitrate (24.3%), secondary sulfate (18.8%), traffic (18.8%), combustion for heating (12.6%), biomass burning (11.8%), coal combustion (3.6%), heavy oil industry (1.8%), smelting industry (4.0%), sea salt (2.7%), and soil (1.7%). Based on the derived sources, the carcinogenic and non-carcinogenic health risks due to PM2.5 inhalation were estimated. The contribution to PM2.5 mass concentration was low for coal combustion, heavy oil industry, and traffic sources but exceeded the benchmark carcinogenic health risk value (1E-06). Therefore, countermeasures on PM2.5 emission sources should be performed based on the PM2.5 mass concentration and health risks. The feature extraction capabilities of the four machine learning models to predict the chemical constituents of PM2.5 were assessed by comparing the prediction accuracy depending on input variables, target constituents for prediction, available period, missing ratios of input data, and study sites. The concentrations of PM2.5 constituents were predicted at three sites (Seoul, Ulsan, and Baengnyeong) in South Korea between 2016 and 2018, using four machine learning models: generative adversarial imputation network (GAIN), fully connected deep neural network (FCDNN), random forest (RF), and k-nearest neighbor (kNN). The prediction accuracy identified by the coefficient of determination (R2) between the prediction and observation was highest in GAIN, followed by FCDNN, RF, and kNN. As the missing ratios (20, 40, 60, and 80%) of the input data increased, the prediction accuracy decreased in the four models and was more noticeable in GAIN and kNN, which are unsupervised models. As the input data period increased, the two deep learning models, GAIN and DNN, had better applicability than the other models, RF and kNN. The study sites with more emission sources exhibited lower prediction accuracy, resulting in the highest R2 in the BR island and the lowest in Ulsan. Among the target constituent groups, ions and trace elements were predicted to have the highest and lowest R2, respectively. This study demonstrated that machine learning models can be extended for further air pollution studies depending on model features, required performance, and experimental conditions, such as data availability and time constraints. The spatial distributions of five PM2.5 sources in South Korea were estimated using Bayesian spatial multivariate receptor modeling. Secondary nitrate, secondary sulfate, motor vehicle emissions, industry, and sea salts were determined to be significant contributors to ambient PM2.5 concentrations in South Korea. The spatial surface of the daily average contribution for each source in South Korea was derived from measurement data from the eight monitoring sites. The source contributions predicted by the BSMRM were also validated using held-out data from a test site (such as Ansan, Daejeon, and Gwangju). These predicted source contributions can aid in developing effective PM2.5 control strategies in cities where no speciated PM2.5 monitoring stations are available. They can also be utilized as source-specific exposures in health effect studies, even in cities where no monitoring stations are available.CHAPTER 1. INTRODUCTION 1 1.1. Background 1 1.2. Objectives 4 1.3. Dissertation structure 5 References 7 CHAPTER 2. LITERATURE REVIEW 10 2.1. Source apportionment and receptor modeling of PM2.5 10 2.2. Toxicity and health risk of assessment PM2.5 21 2.3. Machine learning approaches in prediction of PM2.5 31 2.4. Bayesian approach in source apportionment 41 References 54 CHAPTER 3. SOURCE APPORTIONMENT OF PM2.5 USING PMF MODEL AND HEALTH RISK ASSESSMENT BY INHALATION 69 3.1. Introduction 69 3.2. Materials and methods 72 3.2.1 Study site, sampling, and analysis 72 3.2.2 Positive matrix factorization (PMF) modeling and combined analysis with meteorological data 76 3.2.3 Health risk assessment 80 3.3. Results and discussion 85 3.3.1 PM2.5 mass concentration and chemical speciation 85 3.3.2 Source apportionment of PM2.5 by PMF modeling 89 3.3.3 Carcinogenic and non-carcinogenic health risks 94 3.3.4 Probable source areas or directions 103 3.4. Summary 106 References 107 CHAPTER 4. FEATURE EXTRACTION AND PREDICTION OF PM2.5 CHEMICAL CONSTITUENTS USING MACHINE LEARNING MODELS 120 4.1. Introduction 120 4.2. Materials and methods 124 4.2.1. Study Sites and Data Collection 124 4.2.2. Machine Learning Models and Hyperparameter Optimization 127 4.2.3. Prediction Scenarios 131 4.2.4. Model Validation and Error Estimation 133 4.3. Results and discussion 134 4.3.1. Hyperparameter Optimization 134 4.3.2. Prediction Results for Scenario #1 135 4.3.3. Prediction Results for Scenario #2 157 4.3.4. Features and Performance of Four ML Models 164 4.4. Summary 166 Data Availability 167 Code Availability 167 References 168 CHAPTER 5. BAYESIAN SPATIAL MULTIVARIATE RECEPTOR MODELING FOR SPATIOTEMPORAL ANALYSIS OF PM2.5 SOURCES 175 5.1. Introduction 175 5.2. Materials and methods 180 5.2.1 Air pollution data 180 5.2.2 Bayesian spatial multivariate receptor modeling (BSMRM) 183 5.2.3 Application of BSMRM to Korea PM2.5 speciation data 185 5.3. Results and discussion 189 5.3.1 Bayesian spatial multivariate receptor modeling (BSMRM) results 189 5.3.2 Model validation 196 5.3.3 Spatial distribution of each source in South Korea 204 5.4. Summary 207 References 208 CHAPTER 6. CONCLUSIONS AND FUTURE WORK 214 6.1. Conclusions 214 6.2. Future work 218 ๊ตญ๋ฌธ ์ดˆ๋ก(ABSTRACT IN KOREAN) 219๋ฐ•

    A scalable two-stage Bayesian approach accounting for exposure measurement error in environmental epidemiology

    Full text link
    Accounting for exposure measurement errors has been recognized as a crucial problem in environmental epidemiology for over two decades. Bayesian hierarchical models offer a coherent probabilistic framework for evaluating associations between environmental exposures and health effects, which take into account exposure measurement errors introduced by uncertainty in the estimated exposure as well as spatial misalignment between the exposure and health outcome data. While two-stage Bayesian analyses are often regarded as a good alternative to fully Bayesian analyses when joint estimation is not feasible, there has been minimal research on how to properly propagate uncertainty from the first-stage exposure model to the second-stage health model, especially in the case of a large number of participant locations along with spatially correlated exposures. We propose a scalable two-stage Bayesian approach, called a sparse multivariate normal (sparse MVN) prior approach, based on the Vecchia approximation for assessing associations between exposure and health outcomes in environmental epidemiology. We compare its performance with existing approaches through simulation. Our sparse MVN prior approach shows comparable performance with the fully Bayesian approach, which is a gold standard but is impossible to implement in some cases. We investigate the association between source-specific exposures and pollutant (nitrogen dioxide (NO2_2))-specific exposures and birth outcomes for 2012 in Harris County, Texas, using several approaches, including the newly developed method.Comment: 34 pages, 8 figure
    corecore