Advanced data analysis methods to optimize crop management decisions

Abstract

The lack of knowledge of limiting factors and optimal management practices at the field level is one of the main reasons for the inefficient use of inputs and low productivity, profitability, and sustainability of agricultural systems. Agricultural research aims to update and improve crop management recommendations to match the spatiotemporal variability and the dynamism of production systems. The advances in remote sensing, precision agriculture, the adoption of information and communication technologies by farmers, and the ability to collect and process large amounts of data create an opportunity to reimagine agricultural research and extension. Advanced data analysis methods are needed to take full advantage of the new data sources and other technological innovations. Therefore, the objectives of this Ph.D. research were i) to develop an image-based high-throughput phenotyping system for evaluating soybean maturity in breeding programs, ii) investigate the spatial variability of optimal input rates in on-farm precision experimentation and the potential economic benefit of site-specific input management, iii) develop a data-driven decision support system for maize in Mexico The first chapter addresses the need for scalable and accurate methods to develop imagery-based high-throughput phenotyping in breeding programs. Images were acquired with unmanned aerial vehicles twice a week, starting when the earlier lines began maturation until the latest ones were mature. Two complementary convolutional neural networks were developed to predict the maturity date. The first using a single date, and the second using the five best image dates identified by the first model. The proposed neural network architectures were validated using more than 15,000 ground truth observations from five trials, including data from three growing seasons and two countries. The trained model showed good generalization capability with a root mean squared error lower than two days in four out of five trials. Four methods of estimating prediction uncertainty showed potential at identifying different sources of errors in the maturity date predictions. The architecture developed solves the limitations of previous research and can be used at scale in commercial breeding programs. The second chapter demonstrates how on-farm precision experimentation can be a valuable tool for estimating in-field variation of optimal input rates and improving agronomic decisions. Within-field variability of crop yield levels has been extensively investigated, but the spatial variability of crop yield responses to agronomic treatments is less understood. Mixed geographically weighted regression models were used to estimate local yield response functions. The methodology was applied to investigate the spatial variability in corn response to nitrogen and seed rates in four cornfields in Illinois, USA. The results showed that spatial heterogeneity of model parameters was significant in all four fields evaluated. On average, the root mean squared error of the fitted yield decreased from 1.2 Mg ha-1 in the non-spatial global model to 0.7 Mg ha-1 in the geographically weighted regression model, and the r-squared increased from 10% to 68%. The average potential gain of using optimized uniform rates of seed and nitrogen was US65.00ha1,whiletheaddedpotentialgainofthesitespecificapplicationwasUS 65.00 ha-1, while the added potential gain of the site-specific application was US 58.00 ha-1. The reported results encourage more research on response-based input management recommendations instead of the still widespread focus on yield-based algorithms. The third chapter integrates domain knowledge and explainable machine learning methods to optimize management decisions using observational data. The data comes from the Sustainable Modernization of Traditional Agriculture (MasAgro) project in the southern state of Chiapas - Mexico. The dataset was assembled using field observations, including yield, cultivars and management, and environment variables from soil mapping and gridded weather datasets. Random forest models were trained with the dataset and explained up to 75% of the variation. However, the ability of the model to predict crop performance in future weather scenarios was limited. Overall, nitrogen was the management decision that influenced yields the most, with different yield responses depending on the year and variety. This research exemplifies the use of explainable machine learning to offer farmers the opportunity to benchmark their management decisions with peers in similar growing conditions and visualize what would have happened if they made different decisions

    Similar works