Tree-based ensembles vs neuron-based methods for tabular data - A case study in crop disease forecasting

Abstract

Machine learning and especially deep learning techniques have led to signif-icant success in the last decade and have been predominantly applied to visualdata, natural language, speech, and audio-related tasks but haven’t found ma-jor prominence in the context of tabular data yet. In agriculture, too, deeplearning models are mostly limited to use cases with image data, while tree-based algorithms continue to be the de facto standard for predictive modelingon tabular data. Therefore, the objective of this study is to present a thoroughinvestigation on these two streams of predictive modeling techniques on tab-ular data against a speed-accuracy-complexity tradeoff, namely neuron-basedmethods (Feed Forward fully connected network, LSTM, TabNet, NODE) andtree-based methods(Random Forest, XGBoost, CatBoost, LightGBM). As acase study, in crop disease modeling, prediction models of Septoria and YellowRust disease severity are presented. The results of the study show that tree-based ensemble methods are slightly better in terms of performance metrics.Still, we argue in favor of neuron-based methods since they offer significantadvantages such as automated feature engineering, multi-modal learning, andtransfer learning. We demonstrate how this provides a launchpad for the adop-tion of artificial learning into everyday business. In a broader context, this workdemonstrates that the effective use of machine learning can play a major rolein helping farmers make informed decisions against threats to food productionand thus ensure food security for mankind

    Similar works