8 research outputs found
Greek National Spatial Data Infrastructure. Attempts towards design and implementation.
Spatial Data Infrastructure (SDI) is a long term, evolving process without a priori known results. Different countries try to develop a National SDI (NSDI) not always with a successful outcome. Although the successes are presented thoroughly (e.g. SDI best practice), it is equally important to highlight unsuccessful efforts in order to comprehensively examine different aspects of the SDI development and to acquire a more holistic approach and integrated perspective on the subject. The first Greek NSDI effort that is presented in this paper is an example of an unfruitful first attempt. Examination and assessment of this effort, lead to interesting and hopefully constructive conclusions towards a broader understanding of the SDI development. In order this first effort to be assessed, three time periods in the Greek spatial evolution are determined. These periods seem to have affected each other. People, concepts and inadequacies of a period appeared also during the next one, forming a kind of pattern. The discussion of aspects that influenced and characterized this effort reveal the multiple difficulties and problems the Greek NSDI development had to face
Comparing Machine Learning Models and Hybrid Geostatistical Methods Using Environmental and Soil Covariates for Soil pH Prediction
In the current paper we assess different machine learning (ML) models and hybrid geostatistical methods in the prediction of soil pH using digital elevation model derivates (environmental covariates) and co-located soil parameters (soil covariates). The study was located in the area of Grevena, Greece, where 266 disturbed soil samples were collected from randomly selected locations and analyzed in the laboratory of the Soil and Water Resources Institute. The different models that were assessed were random forests (RF), random forests kriging (RFK), gradient boosting (GB), gradient boosting kriging (GBK), neural networks (NN), and neural networks kriging (NNK) and finally, multiple linear regression (MLR), ordinary kriging (OK), and regression kriging (RK) that although they are not ML models, they were used for comparison reasons. Both the GB and RF models presented the best results in the study, with NN a close second. The introduction of OK to the ML models’ residuals did not have a major impact. Classical geostatistical or hybrid geostatistical methods without ML (OK, MLR, and RK) exhibited worse prediction accuracy compared to the models that included ML. Furthermore, different implementations (methods and packages) of the same ML models were also assessed. Regarding RF and GB, the different implementations that were applied (ranger-ranger, randomForest-rf, xgboost-xgbTree, xgboost-xgbDART) led to similar results, whereas in NN, the differences between the implementations used (nnet-nnet and nnet-avNNet) were more distinct. Finally, ML models tuned through a random search optimization method were compared with the same ML models with their default values. The results showed that the predictions were improved by the optimization process only where the ML algorithms demanded a large number of hyperparameters that needed tuning and there was a significant difference between the default values and the optimized ones, like in the case of GB and NN, but not in RF. In general, the current study concluded that although RF and GB presented approximately the same prediction accuracy, RF had more consistent results, regardless of different packages, different hyperparameter selection methods, or even the inclusion of OK in the ML models’ residuals
Spatial Modelling and Prediction Assessment of Soil Iron Using Kriging Interpolation with pH as Auxiliary Information
In this study, different interpolation techniques are presented, assessed, and compared for the estimation of soil iron (Fe) contents in locations where observations were not available. Initially, 400 soil samples from the Kozani area, which is near Polifitou Lake in northern Greece, were randomly collected from 2013 to 2015 and were analysed in the laboratory to determine the soil Fe concentrations and pH. The soil Fe concentrations were examined for spatial autocorrelation, and semivariograms were used to determine whether pH and Fe exhibited spatial cross correlation. Three interpolation methods, including Ordinary Kriging, Universal Kriging, and Co-Kriging, were applied, and their results were compared with the use of two different cross-validation methods. In the current study, there was evidence of spatial cross correlation of soil Fe and pH for each year, which was subsequently used to improve the interpolation results in locations where there were no measurements. In nearly all cases, Co-Kriging, which takes advantage of the covariance between the two regionalized variables (Fe and pH), outperformed the other interpolation techniques each year
Configuration of the basic design parameters and implementation of Greek National Spatial Data Infrastructure: application in the agricultural sector
The aim of the current thesis is to study the Spatial Data Infrastructure (SDI) specifically for Greece and to introduce a configuration of its key design and implementation parameters, based on the "Tension Synthesis Model" proposal. Additionally, it analyzes and presents the Greek agricultural soil spatial data (definition, processes, approach) and compares it with the requirements of Law 3882/2010 for the Greek NSDI and the European SDI (Infrastructure for Spatial Information in the European Community, INSPIRE). Finally, using the theoretical conclusions of this study, describes two different approaches of integrating Greek soil spatial data in a future Greek NSDI, favoring the one over the other based on the probable viability and compatibility with the thesis results.Αντικείμενο της παρούσας διδακτορικής διατριβής είναι η μελέτη της Υποδομής Γεωχωρικών Πληροφοριών (ΥΓΕΠ, Spatial Data Infrastructure - SDI) ειδικά για την περίπτωση της Ελλάδας και η πρόταση διαμόρφωσης βασικών παραμέτρων σχεδιασμού και υλοποίησης της, με την δόμηση και αξιοποίηση ενός "Μοντέλου Σύνθεσης Τάσεων". Επιπλέον, καταγράφει την υφιστάμενη προσέγγιση από την εδαφολογική επιστημονική κοινότητα, των Ελληνικών αγροτικών εδαφολογικών δεδομένων και την συγκρίνει με τις απαιτήσεις του νόμου 3882/2010 για την Ελληνική Εθνική ΥΓΕΠ (ΕΥΓΕΠ) και της Ευρωπαϊκής ΥΓΕΠ (Infrastructure for Spatial Information in the European Community, INSPIRE). Τέλος αξιοποιώντας τα θεωρητικά συμπεράσματα της παρούσας διατριβής, παραθέτει δυο διαφορετικές προσεγγίσεις (τεχνοκεντρική, κοινωνικοτεχνική) ενσωμάτωσης και αξιοποίησης των Ελληνικών εδαφολογικών δεδομένων σε μια μελλοντική Ελληνική ΕΥΓΕΠ, προκρίνοντας ωστόσο την κοινωνικοτεχνική ως πιο πιθανά βιώσιμη και επιτυχημένη, καθώς και πιο συμβατή τόσο με την σύγχρονη θεωρία όσο και με τα αποτελέσματα της παρούσας διατριβής
Prediction and Uncertainty Capabilities of Quantile Regression Forests in Estimating Spatial Distribution of Soil Organic Matter
One of the core tasks in digital soil mapping (DSM) studies is the estimation of the spatial distribution of different soil variables. In addition, however, assessing the uncertainty of these estimations is equally important, something that a lot of current DSM studies lack. Machine learning (ML) methods are increasingly used in this scientific field, the majority of which do not have intrinsic uncertainty estimation capabilities. A solution to this is the use of specific ML methods that provide advanced prediction capabilities, along with innate uncertainty estimation metrics, like Quantile Regression Forests (QRF). In the current paper, the prediction and the uncertainty capabilities of QRF, Random Forests (RF) and geostatistical methods were assessed. It was confirmed that QRF exhibited outstanding results at predicting soil organic matter (OM) in the study area. In particular, R2 was much higher than the geostatistical methods, signifying that more variation is explained by the specific model. Moreover, its uncertainty capabilities as presented in the uncertainty maps, shows that it can also provide a good estimation of the uncertainty with distinct representation of the local variation in specific parts of the area, something that is considered a significant advantage, especially for decision support purposes
Prediction and Uncertainty Capabilities of Quantile Regression Forests in Estimating Spatial Distribution of Soil Organic Matter
One of the core tasks in digital soil mapping (DSM) studies is the estimation of the spatial distribution of different soil variables. In addition, however, assessing the uncertainty of these estimations is equally important, something that a lot of current DSM studies lack. Machine learning (ML) methods are increasingly used in this scientific field, the majority of which do not have intrinsic uncertainty estimation capabilities. A solution to this is the use of specific ML methods that provide advanced prediction capabilities, along with innate uncertainty estimation metrics, like Quantile Regression Forests (QRF). In the current paper, the prediction and the uncertainty capabilities of QRF, Random Forests (RF) and geostatistical methods were assessed. It was confirmed that QRF exhibited outstanding results at predicting soil organic matter (OM) in the study area. In particular, R2 was much higher than the geostatistical methods, signifying that more variation is explained by the specific model. Moreover, its uncertainty capabilities as presented in the uncertainty maps, shows that it can also provide a good estimation of the uncertainty with distinct representation of the local variation in specific parts of the area, something that is considered a significant advantage, especially for decision support purposes
Spatial or Random Cross-Validation? The Effect of Resampling Methods in Predicting Groundwater Salinity with Machine Learning in Mediterranean Region
Machine learning (ML) algorithms are extensively used with outstanding prediction accuracy. However, in some cases, their overfitting capabilities, along with inadvertent biases, might produce overly optimistic results. Spatial data are a special kind of data that could introduce biases to ML due to their intrinsic spatial autocorrelation. To address this issue, a special resampling method has emerged called spatial cross-validation (SCV). The purpose of this study was to evaluate the performance of SCV compared with conventional random cross-validation (CCV) used in most ML studies. Multiple ML models were created with CCV and SCV to predict groundwater electrical conductivity (EC) with data (A) from Rhodope, Greece, in the summer of 2020; (B) from the same area but at a different time (summer 2019); and (C) from a new area (the Salento peninsula, Italy). The results showed that the SCV provides ML models with superior generalization capabilities and, hence, better prediction results in new unknown data. The SCV seems to be able to capture the spatial patterns in the data while also reducing the over-optimism bias that is often associated with CCV methods. Based on the results, SCV could be applied with ML in studies that use spatial data
An Integrated Approach to Assessing the Soil Quality and Nutritional Status of Large and Long-Term Cultivated Rice Agro-Ecosystems
The aim of this study is to develop an integrated approach to soil quality and fertility assessment in high-yielding rice agro-ecosystems threatened due to overexploitation of soil resources by intensive agriculture. The proposed approach is implemented considering representative pilot fields allocated throughout a study area based on the assumption that soils of similar general properties present a similar nutritional status due to common long-term management practices. The analysis includes (a) object-based image analysis for land zonation, (b) hot-spot analysis for sampling scheme evaluation, (c) setting of critical thresholds in soil parameters for detecting nutrient deficiencies and soil quality problems, and (d) Redundancy Analysis, TITAN analysis, and multiple regression for identifying individual or combined effects of general soil properties (e.g., organic matter, soil texture, pH, salinity) or non-soil parameters (e.g., topographic parameters) on soil nutrients. The approach was applied using as a case study the large rice agro-ecosystem of Thessaloniki plain in Greece considering some site specificities (e.g., high rice yields, calcareous soils) when setting the critical thresholds in soil parameters. The results showed that (a) 62.5% of the pilot fields’ coverage has a simultaneous deficiency in Zn, Mn, and B, (b) organic matter (OM) was the most significant descriptor of nutrients’ variance, and its cold spots (clustered regions of low OM values) showed important overlapping with the cold spots of K, Mg, Zn, Mn, Cu, and B, (c) a higher rate of availability increase in P, K, Mg, Mn, Zn, Fe, Cu, and B was observed when the OM ranged between 2 and 3%, and (d) the multiple regression models that assess K and P concentrations based on general soil properties showed an adequate performance, allowing their use for general assessment of their soil concentrations in the fields of the whole agro-ecosystem