3 research outputs found

    Evaluating the extrapolation potential of random forest digital soil mapping

    No full text
    Spatial soil information is essential for informed decision-making in a wide range of fields. Digital soil mapping (DSM) using machine learning algorithms has become a popular approach for generating soil maps. DSM capitalises on the relation between environmental variables (i.e., features) and a soil property of interest. It typically needs a training dataset that covers the feature space well. Mapping in areas where there are no training data is challenging, because extrapolation in geographic space often induces extrapolation in feature space and can seriously deteriorate prediction accuracy. The objective of this study was to analyse the extrapolation effects of random forest DSM models by predicting topsoil properties (OC, clay, and pH) in four African countries using soil data from the ISRIC Africa Soil Profiles database. The study was conducted in eight experiments whereby soil data from one or three countries were used to predict in the other countries. We calculated similarities between donor and recipient areas using four measures, including soil type similarity, homosoil, dissimilarity index by area of applicability (AOA), and quantile regression forest (QRF) prediction interval width. The aim was to determine the level of agreement between these four measures and identify the method that had the strongest agreement with common validation metrics. The results indicated a positive correlation between soil type similarity, homosoil and dissimilarity index by AOA. Surprisingly, we observed a negative correlation between dissimilarity index by AOA and QRF prediction interval width. Although the cross-validation results for the trained models were acceptable, the extrapolation results were unsatisfactory, highlighting the risk of extrapolation. Using soil data from three countries instead of one increased the similarities for all measures, but it had a limited effect on improving extrapolation. Also, none of the measures had a strong correlation with the validation metrics. This was particularly disappointing for AOA and QRF, which we had expected to be strong indicators of extrapolation prediction performance. Results showed that homosoil and soil type methods had the strongest correlation with validation metrics. The results for this case study revealed limitations of using AOA and QRF as measures of extrapolation effects, highlighting the importance of not relying on these methods blindly. Further research and more case studies are needed to address the effects of extrapolation of DSM models

    High-Resolution Mapping and Assessment of Salt-Affectedness on Arable Lands by the Combination of Ensemble Learning and Multivariate Geostatistics

    No full text
    Soil salinization is one of the main threats to soils worldwide, which has serious impacts on soil functions. Our objective was to map and assess salt-affectedness on arable land (0.85 km2) in Hungary, with high spatial resolution, using a combination of ensemble machine learning and multivariate geostatistics on three salt-affected soil indicators (i.e., alkalinity, electrical conductivity, and sodium adsorption ratio (n = 85 soil samples)). Ensemble modelling with five base learners (i.e., random forest, extreme gradient boosting, support vector machine, neural network, and generalized linear model) was carried out and the results showed that ensemble modelling outperformed the base learners for alkalinity and sodium adsorption ratio with R2 values of 0.43 and 0.96, respectively, while only the random forest prediction was acceptable for electrical conductivity. Multivariate geostatistics was conducted on the stochastic residuals derived from machine learning modelling, as we could reasonably assume that there is spatial interdependence between the selected salt-affected soil indicators. We used 10-fold cross-validation to check the performance of the spatial predictions and uncertainty quantifications, which provided acceptable results for each selected salt-affected soil indicator (for pH value, electrical conductivity, and sodium adsorption ratio, the root mean square error values were 0.11, 0.86, and 0.22, respectively). Our results showed that the methodology applied in this study is efficient in mapping and assessing salt-affectedness on arable lands with high spatial resolution. A probability map for sodium adsorption ratio represents sodic soils exceeding a threshold value of 13, where they are more likely to have soil structure deterioration and water infiltration problems. This map can help the land user to select the appropriate agrotechnical operation for improving soil quality and yield

    High-Resolution Mapping and Assessment of Salt-Affectedness on Arable Lands by the Combination of Ensemble Learning and Multivariate Geostatistics

    No full text
    Soil salinization is one of the main threats to soils worldwide, which has serious impacts on soil functions. Our objective was to map and assess salt-affectedness on arable land (0.85 km2) in Hungary, with high spatial resolution, using a combination of ensemble machine learning and multivariate geostatistics on three salt-affected soil indicators (i.e., alkalinity, electrical conductivity, and sodium adsorption ratio (n = 85 soil samples)). Ensemble modelling with five base learners (i.e., random forest, extreme gradient boosting, support vector machine, neural network, and generalized linear model) was carried out and the results showed that ensemble modelling outperformed the base learners for alkalinity and sodium adsorption ratio with R2 values of 0.43 and 0.96, respectively, while only the random forest prediction was acceptable for electrical conductivity. Multivariate geostatistics was conducted on the stochastic residuals derived from machine learning modelling, as we could reasonably assume that there is spatial interdependence between the selected salt-affected soil indicators. We used 10-fold cross-validation to check the performance of the spatial predictions and uncertainty quantifications, which provided acceptable results for each selected salt-affected soil indicator (for pH value, electrical conductivity, and sodium adsorption ratio, the root mean square error values were 0.11, 0.86, and 0.22, respectively). Our results showed that the methodology applied in this study is efficient in mapping and assessing salt-affectedness on arable lands with high spatial resolution. A probability map for sodium adsorption ratio represents sodic soils exceeding a threshold value of 13, where they are more likely to have soil structure deterioration and water infiltration problems. This map can help the land user to select the appropriate agrotechnical operation for improving soil quality and yield
    corecore