Estimating soil organic carbon using time series Band 11 (SWIR) of multispectral Sentinel-2 satellite images and machine learning algorithms
Abstract
Soil Organic Carbon (SOC) is a critical soil property impacting food security and climate change. Traditional methods for SOC estimation are time-consuming, expensive, and unsuitable for large-scale application. Consequently, researchers have increasingly focused on utilizing Remote Sensing (RS) images for SOC estimation over the past two decades. However, achieving high SOC estimation accuracy (more than 80 %) remains challenging. This limitation often stems from a mismatch between the complexity of SOC and the information captured by traditional RS observations (e.g., reflectance bands or spectral indices), as conventional feature extraction methods from RS images may not be detailed enough to monitor the many factors influencing SOC concentration. One promising solution to enhance feature extraction is the use of time series observations, analyzing multiple images over time instead of relying on single-time images. This study proposes a novel approach leveraging time series of the Sentinel-2 satellite's B11 band (centered around 1610 nm, a region sensitive to SOC absorption features) along with Principal Component Analysis (PCA) and Independent Component Analysis (ICA) transformations to extract more meaningful temporal features. Specifically, ten new features based on temporal variations were derived by applying PCA and ICA to the B11 band time series images. These temporal features were then combined with features derived from the median of all Sentinel-2 images acquired during the summer of 2019, corresponding to the soil data collection period. Four machine learning algorithms (RF, GBRT, XGBoost, and LightGBM) were employed across four distinct scenarios to evaluate the novel feature extraction method and a feature selection algorithm. The scenarios were designed as follows: Scenario one (S#1) and Scenario two (S#2) did not utilize the time series features, while Scenario three (S#3) and Scenario four (S#4) did. A binary Genetic Algorithm (GA) for feature selection was implemented in S#2 and S#4, distinguishing them from S#1 and S#3 respectively. XGBoost performed best, achieving an R2 of 0.891 in S#4 (time series features and GA). Incorporating time series features significantly improved accuracy by 0.11, while GA-based feature selection added another 0.05. The findings highlight the effectiveness of the developed feature extraction algorithm, using Sentinel-2's B11 time series and advanced transformations, for substantially improving SOC level estimation- article
- info:eu-repo/semantics/publishedVersion
- 2025 OA procedure
- /dk/atira/pure/sustainabledevelopmentgoals/zero_hunger; name=SDG 2 - Zero Hunger
- /dk/atira/pure/sustainabledevelopmentgoals/climate_action; name=SDG 13 - Climate Action
- /dk/atira/pure/sustainabledevelopmentgoals/life_on_land; name=SDG 15 - Life on Land