7 research outputs found
A spatiotemporal ensemble machine learning framework for generating land use/land cover time-series maps for Europe (2000–2019) based on LUCAS, CORINE and GLAD Landsat
A spatiotemporal machine learning framework for automated prediction and analysis of long-term Land Use/Land Cover dynamics is presented. The framework includes: (1) harmonization and preprocessing of spatial and spatiotemporal input datasets (GLAD Landsat, NPP/VIIRS) including five million harmonized LUCAS and CORINE Land Cover-derived training samples, (2) model building based on spatial k-fold cross-validation and hyper-parameter optimization, (3) prediction of the most probable class, class probabilities and model variance of predicted probabilities per pixel, (4) LULC change analysis on time-series of produced maps. The spatiotemporal ensemble model consists of a random forest, gradient boosted tree classifier, and an artificial neural network, with a logistic regressor as meta-learner. The results show that the most important variables for mapping LULC in Europe are: seasonal aggregates of Landsat green and near-infrared bands, multiple Landsat-derived spectral indices, long-term surface water probability, and elevation. Spatial cross-validation of the model indicates consistent performance across multiple years with overall accuracy (a weighted F1-score) of 0.49, 0.63, and 0.83 when predicting 43 (level-3), 14 (level-2), and five classes (level-1). Additional experiments show that spatiotemporal models generalize better to unknown years, outperforming single-year models on known-year classification by 2.7% and unknown-year classification by 3.5%. Results of the accuracy assessment using 48,365 independent test samples shows 87% match with the validation points. Results of time-series analysis (time-series of LULC probabilities and NDVI images) suggest forest loss in large parts of Sweden, the Alps, and Scotland. Positive and negative trends in NDVI in general match the land degradation and land restoration classes, with “urbanization” showing the most negative NDVI trend. An advantage of using spatiotemporal ML is that the fitted model can be used to predict LULC in years that were not included in its training dataset, allowing generalization to past and future periods, e.g. to predict LULC for years prior to 2000 and beyond 2020. The generated LULC time-series data stack (ODSE-LULC), including the training points, is publicly available via the ODSE Viewer. Functions used to prepare data and run modeling are available via the eumap library for Python
PM2.5 Estimation in the Czech Republic using Extremely Randomized Trees: A Comprehensive Data Analysis
The accuracy of artificial intelligence techniques in estimating air quality is contingent upon a multitude of influencing factors. Unlike our previous study that examined PM2.5 over whole Europe using unbalanced spatial-temporal data, the focus of this study was on estimating PM2.5 specifically over the Czech Republic using more balanced dataset to train and evaluate the model. Moreover, the spatial autocorrelation between the ground-based station was taken into consideration while building the model. The feature importance while developing the Extra Trees model revealed that spatial autocorrelation had greater significance in comparison to commonly used inputs such as elevation and NDVI. We found that R2 of the 10-CV for the new model was 16% higher than the previous one. R2 reached 0.85 when predicting unseen data in new locations. The developed spatiotemporal model was employed to generate comprehensive daily maps covering the entire study area throughout the 2018–2020 years. The temporal analysis showed that the levels of PM2.5 exceeded recommended limits of 20 µg/m3 during the year 2018 in many regions. The eastern part of the country suffered from the highest concentrations especially over Zlín and Moravian-Silesian Regions where in the 2018 winter, the values reached risky average concentrations of 30 µg/m3 and 35 µg/m3 respectively. Air quality improved during the next two years in all regions reaching promising levels in 2020 where almost all regions had average concentrations less than 20 µg/m3. The generated dataset will be available for other future air quality studies
Machine Learning-Based Approach Using Open Data to Estimate PM2.5 over Europe
Air pollution is currently considered one of the most serious problems facing humans. Fine particulate matter with a diameter smaller than 2.5 micrometres (PM2.5) is a very harmful air pollutant that is linked with many diseases. In this study, we created a machine learning-based scheme to estimate PM2.5 using various open data such as satellite remote sensing, meteorological data, and land variables to increase the limited spatial coverage provided by ground-monitors. A space-time extremely randomised trees model was used to estimate PM2.5 concentrations over Europe, this model achieved good results with an out-of-sample cross-validated R2 of 0.69, RMSE of 5 μg/m3, and MAE of 3.3 μg/m3. The outcome of this study is a daily full coverage PM2.5 dataset with 1 km spatial resolution for the three-year period of 2018–2020. We found that air quality improved throughout the study period over all countries in Europe. In addition, we compared PM2.5 levels during the COVID-19 lockdown during the months March–June with the average of the previous 4 months and the following 4 months. We found that this lockdown had a positive effect on air quality in most parts of the study area except for the United Kingdom, Ireland, north of France, and south of Italy. This is the first study that depends only on open data and covers the whole of Europe with high spatial and temporal resolutions. The reconstructed dataset will be published under free and open license and can be used in future air quality studies
The Effect of Suspended Particulate Matter on the Supraglacial Lake Depth Retrieval from Optical Data
Supraglacial lakes (SGL) are a specific phenomenon of glaciers. They are important for ice dynamics, surface mass balance, and surface hydrology, especially during ongoing climate changes. The important characteristics of lakes are their water storage and drainage. Satellite-based remote sensing is commonly used not only to monitor the area but also to estimate the depth and volume of lakes, which is the basis for long-term spatiotemporal analysis of these phenomena. Lake depth retrieval from optical data using a physical model requires several basic assumptions such as, for instance, the water has little or no dissolved or suspended matter. Several authors using these assumptions state that they are also potential weaknesses, which remain unquantified in the literature. The objective of this study is to quantify the effect of maximum detectable lake depth for water with non-zero suspended particulate matter (SPM). We collected in-situ concurrent measurements of hyperspectral and lake depth observations to a depth of 8 m. Additionally, we collected water samples to measure the concentration of SPM. The results of empirical and physically based models proved that a good relationship still exists between the water spectra of SGL and the lake depth in the presence of 48 mg/L of SPM. The root mean squared error for the models ranged from 0.163 m (Partial Least Squares Regression—PLSR model) to 0.243 m (physically based model), which is consistent with the published literature. However, the SPM limited the maximum detectable depth to approximately 3 m. This maximum detectable depth was also confirmed by the theoretical concept of Philpot (1989). The maximum detectable depth decreases exponentially with an increase in the water attenuation coefficient g, which directly depends on the water properties
Machine Learning-Based Approach Using Open Data to Estimate PM<sub>2.5</sub> over Europe
Air pollution is currently considered one of the most serious problems facing humans. Fine particulate matter with a diameter smaller than 2.5 micrometres (PM2.5) is a very harmful air pollutant that is linked with many diseases. In this study, we created a machine learning-based scheme to estimate PM2.5 using various open data such as satellite remote sensing, meteorological data, and land variables to increase the limited spatial coverage provided by ground-monitors. A space-time extremely randomised trees model was used to estimate PM2.5 concentrations over Europe, this model achieved good results with an out-of-sample cross-validated R2 of 0.69, RMSE of 5 μg/m3, and MAE of 3.3 μg/m3. The outcome of this study is a daily full coverage PM2.5 dataset with 1 km spatial resolution for the three-year period of 2018–2020. We found that air quality improved throughout the study period over all countries in Europe. In addition, we compared PM2.5 levels during the COVID-19 lockdown during the months March–June with the average of the previous 4 months and the following 4 months. We found that this lockdown had a positive effect on air quality in most parts of the study area except for the United Kingdom, Ireland, north of France, and south of Italy. This is the first study that depends only on open data and covers the whole of Europe with high spatial and temporal resolutions. The reconstructed dataset will be published under free and open license and can be used in future air quality studies
Open Geospatial System for LUCAS In Situ Data Harmonization and Distribution
The use of in situ references in Earth observation monitoring is a fundamental need. LUCAS (Land Use and Coverage Area frame Survey) is an activity that has performed repeated in situ surveys over Europe every three years since 2006. The dataset is unique in many aspects; however it is currently not available through a standardized interface, machine-to-machine. Moreover, the evolution of the surveys limits the performance of change analysis using the dataset. Our objective was to develop an open-source system to fill these gaps. This paper presents a developed system solution for the LUCAS in situ data harmonization and distribution. We have designed a multi-layer client-server system that may be integrated into end-to-end workflows. It provides data through an OGC (Open Geospatial Consortium) compliant interface. Moreover, a geospatial user may integrate the data through a Python API (Application Programming Interface) to ease the use in workflows with spatial, temporal, attribute, and thematic filters. Furthermore, we have implemented a QGIS plugin to retrieve the spatial and temporal subsets of the data interactively. In addition, the Python API includes methods for managing thematic information. The system provides enhanced functionality which is demonstrated in two use cases
Phenology and classification of abandoned agricultural land based on ALOS-1 and 2 PALSAR multi-temporal measurements
Agricultural crop abandonment negatively impacts local economy and environment since land, as a resource for agriculture, is not optimally utilized. To take necessary actions to rehabilitate abandoned agricultural lands, the identification of the spatial distribution of these lands must be acknowledged. While optical images had previously illustrated potentials in the identification of agricultural land abandonment, tropical areas often suffer cloud coverage problem that limits the availability of the imageries. Therefore, this study was conducted to investigate the potential of ALOS-1 and 2 (Advanced Land Observing Satellite-1 and 2) PALSAR (Phased Array L-band Synthetic Aperture Radar) images for the identification and classification of abandoned agricultural crop areas, namely paddy, rubber and oil palm fields. Distinct crop phenology for paddy and rubber was identified from ALOS-1 PALSAR; nonetheless, oil palm did not demonstrate any useful phenology for discriminating between the abandoned classes. The accuracy obtained for these abandoned lands of paddy, rubber and oil palm was 93.33% ± 0.06%, 78% ± 2.32% and 63.33% ± 1.88%, respectively. This study confirmed that the understanding of crop phenology in relation to image date selection is essential to obtain high accuracy for classifying abandoned and non-abandoned agricultural crops. The finding also portrayed that PALSAR offers a huge advantage for application of vegetation in tropical areas