Automated machine learning driven stacked ensemble modelling for forest aboveground biomass prediction using multitemporal sentinel-2 data

Abstract

Modelling and large-scale mapping of forest aboveground biomass (AGB) is a complicated, challenging and expensive task. There are considerable variations in forest characteristics that creates functional disparity for different models and needs comprehensive evaluation. Moreover, the human-bias involved in the process of modelling and evaluation affects the generalization of models at larger scales. In this paper, we present an automated machine learning (AutoML) framework for modelling, evaluation and stacking of multiple base models for AGB prediction. We incorporate a hyperparameter optimization procedure for automatic extraction of targeted features from multitemporal Sentinel-2 data that minimizes human-bias in the proposed modelling pipeline. We integrate the two independent frameworks for automatic feature extraction and automatic model ensembling and evaluation. The results suggest that the extracted target-oriented features have excessive contribution of red-edge and short-wave infrared spectrum. The feature importance scale indicates a dominant role of summer based features as compared to other seasons. The automated ensembling and evaluation framework produced a stacked ensemble of base models that outperformed individual base models in accurately predicting forest AGB. The stacked ensemble model delivered the best scores of R2 cv = 0.71 and RMSE = 74.44 Mgha-1 . The other base models delivered R2 cv and RMSE ranging between 0.38–0.66 and 81.27– 109.44 Mg ha-1 respectively. The model evaluation metrics indicated that the stacked ensemble model was more resistant to outliers and achieved a better generalization. Thus, the proposed study demonstrated an effective automated modelling pipeline for predicting AGB by minimizing human-bias and deployable over large and diverse forest area

    Similar works