Quantifying soybean phenotypes using UAV imagery and machine learning, deep learning methods

Abstract

Crop breeding programs aim to introduce new cultivars to the world with improved traits to solve the food crisis. Food production should need to be twice of current growth rate to feed the increasing number of people by 2050. Soybean is one the major grain in the world and only US contributes around 35 percent of world soybean production. To increase soybean production, breeders still rely on conventional breeding strategy, which is mainly a 'trial and error' process. These constraints limit the expected progress of the crop breeding program. The goal was to quantify the soybean phenotypes of plant lodging and pubescence color using UAV-based imagery and advanced machine learning. Plant lodging and soybean pubescence color are two of the most important phenotypes for soybean breeding programs. Soybean lodging and pubescence color is conventionally evaluated visually by breeders, which is time-consuming and subjective to human errors. The goal of this study was to investigate the potential of unmanned aerial vehicle (UAV)-based imagery and machine learning in the assessment of lodging conditions and deep learning in the assessment pubescence color of soybean breeding lines. A UAV imaging system equipped with an RGB (red-green-blue) camera was used to collect the imagery data of 1,266 four-row plots in a soybean breeding field at the reproductive stage. Soybean lodging scores and pubescence scores were visually assessed by experienced breeders. Lodging scores were grouped into four classes, i.e., non-lodging, moderate lodging, high lodging, and severe lodging. In contrast, pubescence color scores were grouped into three classes, i.e., gray, tawny, and segregation. UAV images were stitched to build orthomosaics, and soybean plots were segmented using a grid method. Twelve image features were extracted from the collected images to assess the lodging scores of each breeding line. Four models, i.e., extreme gradient boosting (XGBoost), random forest (RF), K-nearest neighbor (KNN), and artificial neural network (ANN), were evaluated to classify soybean lodging classes. Five data pre-processing methods were used to treat the imbalanced dataset to improve the classification accuracy. Results indicate that the pre-processing method SMOTE-ENN consistently performs well for all four (XGBoost, RF, KNN, and ANN) classifiers, achieving the highest overall accuracy (OA), lowest misclassification, higher F1-score, and higher Kappa coefficient. This suggests that Synthetic Minority Over-sampling-Edited Nearest Neighbor (SMOTE-ENN) may be an excellent pre-processing method for using unbalanced datasets and classification tasks. Furthermore, an overall accuracy of 96 percent was obtained using the SMOTE-ENN dataset and ANN classifier. On the other hand, to classify the soybean pubescence color, seven pre-trained deep learning models, i.e., DenseNet121, DenseNet169, DenseNet201, ResNet50, InceptionResNet-V2, Inception-V3, and EfficientNet were used, and images of each plot were fed into the model. Data was enhanced using two rotational and two scaling factors to increase the datasets. Among the seven pre-trained deep learning models, ResNet50 and DenseNet121 classifiers showed a higher overall accuracy of 88 percent, along with higher precision, recall, and F1-score for all three classes of pubescence color. In conclusion, the developed UAV-based high-throughput phenotyping system can gather image features to estimate soybean crucial phenotypes and classify the phenotypes, which will help the breeders in phenotypic variations in breeding trials. Also, the RGB imagery-based classification could be a cost-effective choice for breeders and associated researchers for plant breeding programs in identifying superior genotypes.Includes bibliographical references

    Similar works