4,141 research outputs found

    Machine learning based soil maps for a wide range of soil properties for the forested area of Switzerland

    Get PDF
    Spatial soil information in forests is crucial to assess ecosystem services such as carbon storage, water purification or biodiversity. However, spatially continuous information on soil properties at adequate resolution is rare in forested areas, especially in mountain regions. Therefore, we aimed to build high-resolution soil property maps for pH, soil organic carbon, clay, sand, gravel and soil density for six depth intervals as well as for soil thickness for the entire forested area of Switzerland. We used legacy data from 2071 soil profiles and evaluated six different modelling approaches of digital soil mapping, namely lasso, robust external-drift kriging, geoadditive modelling, quantile regression forest (QRF), cubist and support vector machines. Moreover, we combined the predictions of the individual models by applying a weighted model averaging approach. All models were built from a large set of potential covariates which included e.g. multi-scale terrain attributes and remote sensing data characterizing vegetation cover. Model performances, evaluated against an independent dataset were similar for all methods. However, QRF achieved the best prediction performance in most cases (18 out of 37 models), while model averaging outperformed the individual models in five cases. For the final soil property maps we therefore used the QRF predictions. Prediction performance showed large differences for the individual soil properties. While for fine earth density the R2 of QRF varied between 0.51 and 0.64 across all depth intervals, soil organic carbon content was more difficult to predict (R2 = 0.19–0.32). Since QRF was used for map prediction, we assessed the 90% prediction intervals from which we derived uncertainty maps. The latter are valuable to better interpret the predictions and provide guidance for future mapping campaigns to improve the soil maps

    머신러닝 모델을 사용한 2002~2020년 한국의 O3, NO2, CO 농도의 고해상도 추정

    Get PDF
    학위논문(석사) -- 서울대학교대학원 : 보건대학원 보건학과, 2023. 2. 김호.Backrgound : Long-term exposure to ozone (O3), nitrogen dioxide (NO2), and carbon monoxide (CO) is known to cause various diseases and increase mortality. For that reason, estimating ground-level O3, NO2, and CO concentrations with a high spatial resolution is crucial for assessing the health effects associated with these air pollutants. However, related studies are limited in South Korea. This study aimed to develop machine learning-based models to predict the monthly O3 (average of daily 8-hour maximums), NO2, and CO at a spatial resolution of 1 km × 1 km across South Korea from 2002 to 2020. Methods : Approximately 80% of the monitoring stations were used to train the three machine learning models (random forest, light gradient boosting, and neural network) with a 10-fold cross-validation, and 20% of the monitoring stations were used to test the model performance. The author also applied ensemble models to integrate the variation in predictions among the models. Multiple predictors with satellite-based remote sensing data, inverse distance weighted ground-level air pollutants, land use variables, reanalysis datasets for meteorological variables, and regional socioeconmoic variables collected from various databases were included in the prediction model. Results : For O3, the overall R2 of the ensemble model was 0.841 during the entire study period. Urban areas showed a better model performance (R2 = 0.845) than rural areas (R2 = 0.762). For NO2, the highest overall R2 was 0.756, which best fit in autumn (R2 = 0.768). For CO, the overall R2 value was 0.506. This study provides high spatial resolution monthly average O3 and NO2 estimates with excellent performance (R2 > 0.75). Conclusion : The authors predictions can be used to analyze the spatial patterns in pollutants in relation to population characteristics and studies on the health effects of long-term exposure to air pollution using geocode-based health information and local health data.연구배경 : 오존(O3), 이산화질소(NO2), 일산화탄소(CO)에 장기간 노출되면 각종 질병을 유발하고 사망률을 높이는 것으로 알려져 있다. 그렇기에, 고해상도로 지표면 O3, NO2, CO 농도를 추정하는 것은 이러한 대기오염물질과 관련된 건강 영향을 평가하는 데 매우 중요하다. 하지만, 장기간에 걸쳐 고해상도로 가스상 대기오염물질(O3, NO2, CO)를 추정한 연구는 국내에서 아직 진행된 바가 없다. 따라서, 본 연구는 2002년부터 2020년까지 대한민국 전역에서 1km × 1km의 공간해상도로 월별 O3(일평균 8시간 최대치), NO2, CO를 머신러닝 기반 모델 및 그들의 앙상블 모형을 통해 예측하고자 한다. 연구방법 : 3가지 머신러닝 모델(랜덤 포레스트, 라이트 그래디언트 부스팅, 신경망)의 최적의 파라미터를 찾기 위해 모니터링 스테이션의 약 80%를 훈련 데이터로 사용하였고, 10-fold 교차검증을 통해 훈련 데이터 내에서 훈련/평가 단계를 거쳤으며, 나머지 모니터링 스테이션의 20%를 모델 평가에 사용하였다. 여기에 추가로 머신러닝 모델 간의 예측 변동을 통합하기 위해 앙상블 모델을 적용했다. 데이터에는 위성 기반 원격 감지 데이터, 역거리 가중치 기반 대기오염농도, 토지 이용 변수, 기상 재분석 자료, 다양한 데이터베이스에서 수집된 지역 사회경제적 변수 등이 포함되었다. 연구결과 : O3의 경우, 전체 연구 기간 동안 앙상블 모델의 R2가 0.841을 기록했으며, 도시 지역이 농촌 지역(R2 = 0.762)보다 우수한 예측 성능(R2 = 0.845)을 보였다. NO2의 경우, 앙상블(평균) 모델의 R2가 0.756으로 가장 높았으며, 계절로 보면 가을에 예측 성능이 가장 높았다(R2 = 0.768). CO의 경우, R2가 0.506 을 기록했다. 본 연구는 O3 및 NO2 에서 R2 > 0.75 으로 높은 예측력의 고해상도 월평균 추정치를 제공한다. 결론 : 본 연구에서 얻어진 대기오염 추정 결과는 인구 특성과 관련된 가스상 대기오염물질의 공간 패턴을 분석하거나, 위치 기반 건강 정보와 행정구역 단위 건강 데이터와 엮여서 장기간 대기오염 노출의 건강 영향을 평가하는 연구에 사용될 수 있을 것으로 기대된다.Chapter 1. Introduction 1 Chapter 2. Materials and Methods 6 2.1. Study area 6 2.2. Air pollution monitoring data 6 2.3. Satellite-based remote sensing data 7 2.3.1. Meteorological data 7 2.3.2. Land-use data 10 2.3.3. Surface reflectance 11 2.4. Regional socioeconomic predictors 12 2.5. Modeling procedures 13 2.5.1. Data Preprocessing 14 2.5.2. Machine learning-based model 15 2.5.3. Ensemble Model 16 2.5.4. Model Prediction 17 Chapter 3. Results 19 Chapter 4. Discussion 29 Chapter 5. Conclusion 34 Supplementary materials 47 국문 초록 82 Tables Table 1. Model performance for O3, NO2, and CO overall and in three- and four-year periods 21 Table S1. Detailed information about data sources 61 Table S2. Variables sorted by % missing values 65 Table S3. Results of parameter grid search using 10-fold cross-validation for O3, NO2 and CO 68 Table S4. Yearly ensemble (GAM) performance for O3, NO2, and CO 70 Table S5. Model performances for O3, NO2, and CO by season and urbanity 71 Table S6. Number of monitoring stations by year for O3, NO2 and CO in urban and rural areas 73 Figures Fig. 1. Flowchart of the modeling process. GEE: Google Earth Engine, SEDAC: Socioeconomic Data and Applications Center, RSD: Regional Socioeconomic Database from Korean Disease Control and Prevention Agency 18 Fig. 2. Density scatter plot for monthly averages of the monitored and predicted concentrations of O3, NO2, and CO 26 Fig. 3. Maps of monitored and predicted O3, NO2 and CO during 2002~2020 27 Fig. 4. Percentage decrease in R2 when excluding grouped variables from each machine learning model of O3, NO2, and CO. The closer the color is to red, the greater the effect of the variables on the model performance 28 Fig. S1. Urban/Rural and Metropolitan (Metro) area for entire contiguous regions of South Korea 74 Fig. S2. Distribution maps of predicted O3 (ppb) by year and season for contiguous South Korea 75 Fig. S3. Distribution maps of predicted NO2 (ppb) by year and season for contiguous South Korea 76 Fig. S4. Distribution maps of predicted CO (ppm) by year and season for contiguous South Korea 77 Fig. S5. Monthly fluctuations in the number of monitoring stations for O3, NO2, and CO between 2002 and 2020 78 Fig. S6. Density scatter plot for monthly averages of the monitored and predicted concentrations of O3, NO2, and CO with seasonal discrimination 79석

    Individual tree-based vs pixel-based approaches to mapping forest functional traits and diversity by remote sensing

    Full text link
    Plant ecology and biodiversity research have increasingly incorporated trait-based approaches and remote sensing. Compared with traditional field survey (which typically samples individual trees), remote sensing enables quantifying functional traits over large contiguous areas, but assigning trait values to biological units such as species and individuals is difficult with pixel-based approaches. We used a subtropical forest landscape in China to compare an approach based on airborne LiDAR-delineated individual tree crowns (ITCs) with a pixel-based approach for assessing functional traits from remote sensing data. We compared trait distributions, trait–trait relationships and functional diversity metrics obtained by the ITC- and pixel-based approaches at changing pixel size and extent. We found that morphological traits derived from airborne laser scanning showed more differences between ITC- and pixel-based approaches than physiological traits estimated by airborne Pushbroom Hyperspectral Imager-3 (PHI-3) hyperspectral data. Pixel sizes approximating average tree crowns yielded similar results as ITCs, but 95th quantile height and foliage height diversity tended to be overestimated and leaf area index underestimated relative to ITC-based values. With increasing pixel size, the differences to ITC-based trait values became larger and less trait variance was captured, indicating information loss. The consistency of ITC- and pixel-based functional richness also decreased with increasing pixel size, and changed with the observed extent for functional diversity monitoring. We conclude that whereas ITC-based approaches in principle allow partitioning of variation between individuals, genotypes and species, high-resolution pixel-based approaches come close to this and can be suitable for assessing ecosystem-scale trait variation by weighting individuals and species according to coverage

    Soil Spatial Scaling: Modelling variability of soil properties across scales using legacy data

    Get PDF
    Understanding how soil variability changes with spatial scale is critical to our ability to understand and model soil processes at scales relevant to decision makers. This thesis uses legacy data to address the ongoing challenge of understanding soil spatial variability in a number of complementary ways. We use a range of information: precision agriculture studies; compiled point datasets; and remotely observed raster datasets. We use classical geostatistics, but introduce a new framework for comparing variability of spatial properties across scales. My thesis considers soil spatial variability from a number of geostatistical angles. We find the following: • Field scale variograms show differing variance across several magnitudes. Further work is required to ensure consistency between survey design, experimental methodology and statistical methodology if these results are to become useful for comparison. • Declustering is a useful tool to deal with the patchy design of legacy data. It is not a replacement for an evenly distributed dataset, but it does allow the use of legacy data which would otherwise have limited utility. • A framework which allows ‘roughness’ to be expressed as a continuous variable appears to fit the data better than the mono-fractal or multi-fractal framework generally associated with multi–scale modelling of soil spatial variability. • Soil appears to have a similar degree of stochasticity to short range topographic variability, and a higher degree of stochasticity at short ranges (less than 10km and 100km) than vegetation and Radiometrics respectively. • At longer ranges of variability (i.e. around 100km) only rainfall and height above sea level show distinctly different stochasticity. • Global variograms show strong isotropy, unlike the variograms for the Australian continent

    ALOS-2 L-band SAR backscatter data improves the estimation and temporal transferability of wildfire effects on soil properties under different post-fire vegetation responses

    Get PDF
    Remote sensing techniques are of particular interest for monitoring wildfire effects on soil properties, which may be highly context-dependent in large and heterogeneous burned landscapes. Despite the physical sense of synthetic aperture radar (SAR) backscatter data for characterizing soil spatial variability in burned areas, this approach remains completely unexplored. This study aimed to evaluate the performance of SAR backscatter data in C-band (Sentinel-1) and L-band (ALOS-2) for monitoring fire effects on soil organic carbon and nutrients (total nitrogen and available phosphorous) at short term in a heterogeneous Mediterranean landscape mosaic made of shrublands and forests that was affected by a large wildfire. The ability of SAR backscatter coefficients and several band transformations of both sensors for retrieving soil properties measured in the field in immediate post-fire situation (one month after fire) was tested through a model averaging approach. The temporal transferability of SAR-based models from one month to one year after wildfire was also evaluated, which allowed to assess short-term changes in soil properties at large scale as a function of pre-fire plant community type. The retrieval of soil properties in immediate post-fire conditions featured a higher overall fit and predictive capacity from ALOS-2 L-band SAR backscatter data than from Sentinel-1 C-band SAR data, with the absence of noticeable under and overestimation effects. The transferability of the ALOS-2 based model to one year after wildfire exhibited similar performance to that of the model calibration scenario (immediate post-fire conditions). Soil organic carbon and available phosphorous content was significantly higher one year after wildfire than immediately after the fire disturbance. Conversely, the short-term change in soil total nitrogen was ecosystem-dependent. Our results support the applicability of L-band SAR backscatter data for monitoring short-term variability of fire effects on soil properties, reducing data gathering costs within large and heterogeneous burned landscapesS

    Impact of spatial soil and climate input data aggregation on regional yield simulations

    Get PDF
    We show the error in water-limited yields simulated by crop models which is associated with spatially aggregated soil and climate input data. Crop simulations at large scales (regional, national, continental) frequently use input data of low resolution. Therefore, climate and soil data are often generated via averaging and sampling by area majority. This may bias simulated yields at large scales, varying largely across models. Thus, we evaluated the error associated with spatially aggregated soil and climate data for 14 crop models. Yields of winter wheat and silage maize were simulated under water-limited production conditions. We calculated this error from crop yields simulated at spatial resolutions from 1 to 100 km for the state of North Rhine-Westphalia, Germany. Most models showed yields biased by <15% when aggregating only soil data. The relative mean absolute error (rMAE) of most models using aggregated soil data was in the range or larger than the inter-annual or inter-model variability in yields. This error increased further when both climate and soil data were aggregated. Distinct error patterns indicate that the rMAE may be estimated from few soil variables. Illustrating the range of these aggregation effects across models, this study is a first step towards an ex-ante assessment of aggregation errors in large-scale simulations

    MSWEP : 3-hourly 0.25° global gridded precipitation (1979-2015) by merging gauge, satellite, and reanalysis data

    Get PDF
    Current global precipitation (P) datasets do not take full advantage of the complementary nature of satellite and reanalysis data. Here, we present Multi-Source Weighted-Ensemble Precipitation (MSWEP) version 1.1, a global P dataset for the period 1979-2015 with a 3hourly temporal and 0.25 degrees ffi spatial resolution, specifically designed for hydrological modeling. The design philosophy of MSWEP was to optimally merge the highest quality P data sources available as a function of timescale and location. The long-term mean of MSWEP was based on the CHPclim dataset but replaced with more accurate regional datasets where available. A correction for gauge under-catch and orographic effects was introduced by inferring catchment-average P from streamflow (Q) observations at 13 762 stations across the globe. The temporal variability of MSWEP was determined by weighted averaging of P anomalies from seven datasets; two based solely on interpolation of gauge observations (CPC Unified and GPCC), three on satellite remote sensing (CMORPH, GSMaP-MVK, and TMPA 3B42RT), and two on atmospheric model reanalysis (ERA-Interim and JRA-55). For each grid cell, the weight assigned to the gauge-based estimates was calculated from the gauge network density, while the weights assigned to the satellite-and reanalysis-based estimates were calculated from their comparative performance at the surrounding gauges. The quality of MSWEP was compared against four state-of-the-art gauge-adjusted P datasets (WFDEI-CRU, GPCP-1DD, TMPA 3B42, and CPC Unified) using independent P data from 125 FLUXNET tower stations around the globe. MSWEP obtained the highest daily correlation coefficient (R) among the five P datasets for 60.0% of the stations and a median R of 0.67 vs. 0.44-0.59 for the other datasets. We further evaluated the performance of MSWEP using hydrological modeling for 9011 catchments (< 50 000 km(2)) across the globe. Specifically, we calibrated the simple conceptual hydrological model HBV (Hydrologiska Byrans Vattenbalansavdelning) against daily Q observations with P from each of the different datasets. For the 1058 sparsely gauged catchments, representative of 83.9% of the global land surface (excluding Antarctica), MSWEP obtained a median calibration NSE of 0.52 vs. 0.29-0.39 for the other P datasets. MSWEP is available via http://www.gloh2o.org

    Impact of Spatial Soil and Climate Input Data Aggregation on Regional Yield Simulations

    Get PDF
    This work was financially supported by the German Federal Ministry of Food and Agriculture (BMEL) through the Federal Office for Agriculture and Food (BLE), (2851ERA01J). FT and RPR were supported by FACCE MACSUR (3200009600) through the Finnish Ministry of Agriculture and Forestry (MMM). EC, HE and EL were supported by The Swedish Research Council for Environment, Agricultural Sciences and Spatial Planning (220-2007-1218) and by the strategic funding ‘Soil-Water-Landscape’ from the faculty of Natural Resources and Agricultural Sciences (Swedish University of Agricultural Sciences) and thank professor P-E Jansson (Royal Institute of Technology, Stockholm) for support. JC, HR and DW thank the INRA ACCAF metaprogramm for funding and Eric Casellas from UR MIAT INRA for support. CB was funded by the Helmholtz project “REKLIM—Regional Climate Change”. CK was funded by the HGF Alliance “Remote Sensing and Earth System Dynamics” (EDA). FH was funded by the German Research Foundation (Deutsche Forschungsgemeinschaft, DFG) under the Grant FOR1695. FE and SS acknowledge support by the German Science Foundation (project EW 119/5-1). HH, GZ, SS, TG and FE thank Andreas Enders and Gunther Krauss (INRES, University of Bonn) for support. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.Peer reviewedPublisher PD