Estimation of the soil organic carbon content is of utmost importance in
understanding the chemical, physical, and biological functions of the soil.
This study proposes machine learning algorithms of support vector machines,
artificial neural networks, regression tree, random forest, extreme gradient
boosting, and conventional deep neural network for advancing prediction models
of SOC. Models are trained with 1879 composite surface soil samples, and 105
auxiliary data as predictors. The genetic algorithm is used as a feature
selection approach to identify effective variables. The results indicate that
precipitation is the most important predictor driving 15 percent of SOC spatial
variability followed by the normalized difference vegetation index, day
temperature index of moderate resolution imaging spectroradiometer,
multiresolution valley bottom flatness and land use, respectively. Based on 10
fold cross validation, the DNN model reported as a superior algorithm with the
lowest prediction error and uncertainty. In terms of accuracy, DNN yielded a
mean absolute error of 59 percent, a root mean squared error of 75 percent, a
coefficient of determination of 0.65, and Lins concordance correlation
coefficient of 0.83. The SOC content was the highest in udic soil moisture
regime class with mean values of 4 percent, followed by the aquic and xeric
classes, respectively. Soils in dense forestlands had the highest SOC contents,
whereas soils of younger geological age and alluvial fans had lower SOC. The
proposed DNN is a promising algorithm for handling large numbers of auxiliary
data at a province scale, and due to its flexible structure and the ability to
extract more information from the auxiliary data surrounding the sampled
observations, it had high accuracy for the prediction of the SOC baseline map
and minimal uncertainty.Comment: 30pages, 9 figure