2,275 research outputs found

    A systematic review of data quality issues in knowledge discovery tasks

    Get PDF
    Hay un gran crecimiento en el volumen de datos porque las organizaciones capturan permanentemente la cantidad colectiva de datos para lograr un mejor proceso de toma de decisiones. El desafío mas fundamental es la exploración de los grandes volúmenes de datos y la extracción de conocimiento útil para futuras acciones por medio de tareas para el descubrimiento del conocimiento; sin embargo, muchos datos presentan mala calidad. Presentamos una revisión sistemática de los asuntos de calidad de datos en las áreas del descubrimiento de conocimiento y un estudio de caso aplicado a la enfermedad agrícola conocida como la roya del café.Large volume of data is growing because the organizations are continuously capturing the collective amount of data for better decision-making process. The most fundamental challenge is to explore the large volumes of data and extract useful knowledge for future actions through knowledge discovery tasks, nevertheless many data has poor quality. We presented a systematic review of the data quality issues in knowledge discovery tasks and a case study applied to agricultural disease named coffee rust

    Learning-based crop management optimization using multi-stream convolutional neural networks

    Get PDF
    Improving crop management is an essential step towards solving the food security challenge. Despite the advances in precision agriculture, new methods are needed to create decision-support systems to help farmers increase productivity while accounting for environmental impacts and financial risks. This dissertation presents a class of learning-based optimization algorithms for spatial allocation of crop inputs, and a new framework for online coverage path planning with potential use in tasks such as planting and harvesting. The proposed algorithms use Multi-stream Convolutional Neural Networks (MSCNN) to learn relevant spatial features from the environment and use them to optimize the available control inputs. In the crop inputs optimization problem, an MSCNN combines five input variables as in a regression problem to better predict yield. The predictive model is then used as the base of a gradient-ascent algorithm to maximize a custom objective function. To leverage the applicability of this algorithm, a risk-aware version of this method is also proposed. The predictive uncertainty is measured and used as a constraint to comply with different levels of risk-aversion. Experiments with real crop fields demonstrate that this method significantly reduces the yield prediction errors when compared to the state of the art algorithms. Results from the optimization algorithm show an increase in the expected net revenue of up to 6.8% when compared with the status quo management while providing safety bounds. In the coverage path planning framework, an MSCNN agent learns a control policy from demonstrations of paths obtained offline through heuristic algorithms, by using imitation learning. The resulting control policy is further improved through policy-gradient reinforcement learning. Simulations show that the improved control policy outperforms the offline algorithms used during the imitation learning phase, and that the proposed framework can be easily adapted to different cost functions

    Crop Prediction and Recommendation Using Ensemble of DL Models

    Get PDF
    Agriculture remains the primary source of income in India and is characterised by a variety of crops, soil types and climatic conditions. This study suggests an additional ensemble model serving to give effective and speedy predictions and recommendations for crops. During the study data from nearly 8 distinct features was collected from various databases and 2201 instances were finalised. The data focussed on climatic conditions such as temperature, rainfall, crop type and soil features, particularly the ratio of nitrogen, potassium and levels of phosphorous. Research indicates that algorithms such as Neural Networks and XGBoost share high effectiveness and accuracy in developing crop yield prediction models. Extensive research conducted shows that the ensemble of XGBoost and MLP Classifier algorithms provide an accuracy of 99.39%. By predicting crop yield based on historical data, the study aims to give sound recommendations on the crops to be cultivated under various weather and soil conditions

    Simulating soil salinity dynamics, cotton yield and evapotranspiration under drip irrigation by ensemble machine learning

    Get PDF
    We thank the China Scholarship Council (CSC) for providing a scholarship (202206710073) to Zewei Jiang. This work was supported by the Fundamental Research Funds for the Central Universities (B220203009), the Postgraduate Research & Practice Program of Jiangsu Province (KYCX22_0669), the Water Conservancy Science and Technology Project of Jiangxi Province (201921ZDKT06, 202124ZDKT09), the National Natural Science Foundation of China (51879076), the Fundamental Research Funds for the Central Universities (B210204016), Science & Technology Specific Projects in Agricultural High-tech Industrial Demonstration Area of the Yellow River Delta, Grant No: 2022SZX01.Peer reviewedPublisher PD

    A Cost-effective Multispectral Sensor System for Leaf-Level Physiological Traits

    Get PDF
    With the concern of the global population to reach 9 billion by 2050, ensuring global food security is a prime challenge for the research community. One potential way to tackle this challenge is sustainable intensification; making plant phenotyping a high throughput may go a long way in this respect. Among several other plant phenotyping schemes, leaf-level plant phenotyping needs to be implemented on a large scale using existing technologies. Leaf-level chemical traits, especially macronutrients and water content are important indicators to determine crop’s health. Leaf nitrogen (N) level, is one of the critical macronutrients that carries a lot of worthwhile nutrient information for classifying the plant’s health. Hence, the non-invasive leaf’s N measurement is an innovative technique for monitoring the plant’s health. Several techniques have tried to establish a correlation between the leaf’s chlorophyll content and the N level. However, a recent study showed that the correlation between chlorophyll content and leaf’s N level is profoundly affected by environmental factors. Moreover, it is also mentioned that when the N fertilization is high, chlorophyll becomes saturated. As a result, determining the high levels of N in plants becomes difficult. Moreover, plants need an optimum level of phosphorus (P) for their healthy growth. However, the existing leaf-level P status monitoring methods are expensive, limiting their deployment for the farmers of low resourceful countries. The aim of this thesis is to develop a low-cost, portable, lightweight, multifunctional, and quick-read multispectral sensor system to sense N, P, and water in leaves non-invasively. The proposed system has been developed based on two reflectance-based multispectral sensors (visible and near-infrared (NIR)). In addition, the proposed device can capture the reflectance data at 12 different wavelengths (six for each sensor). By deploying state of the art machine learning algorithms, the spectroscopic information is modeled and validated to predict that nutrient status. A total of five experiments were conducted including four on the greenhouse-controlled environment and one in the field. Within these five, three experiments were dedicated for N sensing, one for water estimation, and one for P status determination. In the first experiment, spectral data were collected from 87 leaves of canola plants, subjected to varying levels of N fertilization. The second experiment was performed on 1008 leaves from 42 canola cultivars, which were subjected to low and high N levels, used in the field experiment. The K-Nearest Neighbors (KNN) algorithm was employed to model the reflectance data. The trained model shows an average accuracy of 88.4% on the test set for the first experiment and 79.2% for the second experiment. In the third and fourth experiments, spectral data were collected from 121 leaves for N and 186 for water experiments respectively; and Rational Quadratic Gaussian Process Regression (GPR) algorithm is applied to correlate the reflectance data with actual N and water content. By performing 5-fold cross-validation, the N estimation shows a coefficient of determination (R^2) of 63.91% for canola, 80.05% for corn, 82.29% for soybean, and 63.21% for wheat. For water content estimation, canola shows an R^2 of 18.02%, corn of 68.41%, soybean of 46.38%, and wheat of 64.58%. Finally, the fifth experiment was conducted on 267 leaf samples subjected to four levels of P treatments, and KNN exhibits the best accuracy, on the test set, of about 71.2%, 73.5%, and 67.7% for corn, soybean, and wheat, respectively. Overall, the result concludes that the proposed cost-effective sensing system can be viable in determining leaf N and P status/content. However, further investigation is needed to improve the water estimation results using the proposed device. Moreover, the utility of the device to estimate other nutrients as well as other crops has great potential for future research

    Causal Forest Approach For Site-Specific Input Management via On-Farm Precision Experimentation

    Get PDF
    Estimating site-specific crop yield response to changes to input (e.g., seed, fertilizer) management is a critical step in making economically optimal site-specific input management recommendations. Past studies have attempted to estimate yield response functions using various Machine Learning (ML) methods, including the Random Forest (RF), Boosted Random Forest (BRF), and Convolutional Neural Network (CNN) methods. This study proposes use of the Causal Forest (CF) model, which is one of the emerging ML methods that comprise “Causal Machine Learning.” Unlike previous yield-prediction-oriented ML methods, CF focuses strictly on estimating heterogeneous treatment effects (changes in yields that result from changes in input application rates) of inputs. We report results of using Monte Carlo simulations assuming various production scenarios to test the effectiveness of CF in estimating site-specific economically optimal nitrogen rates (EONRs), comparing CF with the yield-prediction-oriented ML methods RF, BRF, and CNN. CF\u27s estimations of site-specific EONRs were superior under all scenarios considered. We also show that the model’s yield prediction accuracy need not imply EONR prediction accuracy. Advisor: Taro Mien

    Mehitamata õhusõiduki rakendamine põllukultuuride saagikuse ja maa harimisviiside tuvastamisel

    Get PDF
    A Thesis for applying for the degree of Doctor of Philosophy in Environmental Protection.Väitekiri filosoofiadoktori kraadi taotlemiseks keskkonnakaitse erialal.This thesis aims to examine how machine learning (ML) technologies have aided significant advancements in image analysis in the area of precision agriculture. These multimodal computing technologies extend the use of machine learning to a broader spectrum of data collecting and selection for the advancement of agricultural practices (Nawar et al., 2017) These techniques will assist complicated cropping systems with more informed decisions with less human intervention, and provide a scalable framework for incorporating expert knowledge of the PA system. (Chlingaryan et al., 2018). Complexity, on the other hand, can be seen as a disadvantage in crop trials, as machine learning models require training/testing databases, limited areas with insignificant sampling sizes, time and space-specificity, and environmental factor interventions, all of which complicate parameter selection and make using a single empirical model for an entire region impractical. During the early stages of writing this thesis, we used a relatively traditional machine learning method to address the regression problem of crop yield and biomass prediction [(i.e., random forest regression (RFR), support vector regression (SVR), and artificial neural network (ANN)] to predicted dry matter (DM) yields of red clover. It obtained favourable results, however, the choosing of hyperparameters, the lengthy algorithms selection process, data cleaning, and redundant collinearity issues significantly limited the way of the machine learning application. We will further discuss the recent trend of automated machine learning (AutoML) that has been driving further significant technological innovation in the application of artificial intelligence from its automated algorithm selection and hyperparameter optimization of the deployable pipeline model for unravelling substance problems. However, a present knowledge gap exists in the integration of machine learning (ML) technology with unmanned aerial systems (UAS) and hyperspectral-based imaging data categorization and regression applications. In this thesis, we explored a state-of-the-art (SOTA) and entirely open-source AutoML framework, Auto-sklearn, which was built on one of the most frequently used machine learning systems, Scikit-learn. It was integrated with two unique AutoML visualization tools to examine the recognition and acceptance of multispectral vegetation indices (VI) data collected from UAS and hyperspectral narrow-band VIs across a varied spectrum of agricultural management practices (AMP). These procedures incorporate soil tillage method (STM), cultivation method (CM), and manure application (MA), and are classified as four-crop combination fields (i.e., red clover-grass mixture, spring wheat, pea-oat mixture, and spring barley). Additionally, they have not been thoroughly evaluated and lack characteristics that are accessible in agriculture remote sensing applications. This thesis further explores the existing gaps in the knowledge base for several critical crop categories and cultivation management methods referring to biomass and yield analysis, as well as to gain a better understanding of the potential for remotely sensed solutions to field-based and multifunctional platforms to meet precision agriculture demands. To overcome these knowledge gaps, this research introduces a rapid, non-destructive, and low-cost framework for field-based biomass and grain yield modelling, as well as the identification of agricultural management practices. The results may aid agronomists and farmers in establishing more accurate agricultural methods and in monitoring environmental conditions more effectively.Doktoritöö eesmärk oli uurida, kuidas masinõppe (MÕ) tehnoloogiad võimaldavad edusamme täppispõllumajanduse valdkonna pildianalüüsis. Multimodaalsed arvutustehnoloogiad laiendavad masinõppe kasutamist põllumajanduses andmete kogumisel ja valimisel (Nawar et al., 2017). Selline täpsemal informatsioonil põhinev tehnoloogia võimaldab keerukate viljelussüsteemide puhul teha otsuseid inimese vähema sekkumisega, ja loob skaleeritava raamistiku täppispõllumajanduse jaoks (Chlingaryan et al., 2018). Põllukultuuride katsete korral on komplekssete masinõppemudelite kasutamine keerukas, sest alad on piiratud ning valimi suurus ei ole piisav; vaja on testandmebaase, kindlaid aja- ja ruumitingimusi ning keskkonnategureid. See komplitseerib parameetrite valikut ning muudab ebapraktiliseks ühe empiirilise mudeli kasutamise terves piirkonnas. Siinse uurimuse algetapis rakendati suhteliselt traditsioonilist masinõppemeetodit, et lahendada saagikuse ja biomassi prognoosimise regressiooniprobleem (otsustusmetsa regression, tugivektori regressioon ja tehisnärvivõrk) punase ristiku prognoositava kuivaine saagikuse suhtes. Saadi sobivaid tulemusi, kuid hüperparameetrite valimine, pikk algoritmide valimisprotsess, andmete puhastamine ja kollineaarsusprobleemid takistasid masinõpet oluliselt. Automatiseeritud masinõppe (AMÕ) uusimate suundumustena rakendatakse tehisintellekti, et lahendada põhiprobleemid automatiseeritud algoritmi valiku ja rakendatava pipeline-mudeli hüperparameetrite optimeerimise abil. Seni napib teadmisi MÕ tehnoloogia integreerimiseks mehitamata õhusõidukite ning hüperspektripõhiste pildiandmete kategoriseerimise ja regressioonirakendustega. Väitekirjas uuriti nüüdisaegset ja avatud lähtekoodiga AMÕ tehnoloogiat Auto-sklearn, mis on ühe enimkasutatava masinõppesüsteemi Scikit-learn edasiarendus. Süsteemiga liideti kaks unikaalset AMÕ visualiseerimisrakendust, et uurida mehitamata õhusõidukiga kogutud andmete multispektraalsete taimkatteindeksite ja hüperspektraalsete kitsaribaandmete taimkatteindeksite tuvastamist ja rakendamist põllumajanduses. Neid võtteid kasutatakse mullaharimisel, kultiveerimisel ja sõnnikuga väetamisel nelja kultuuriga põldudel (punase ristiku rohusegu, suvinisu, herne-kaera segu, suvioder). Neid ei ole põhjalikult hinnatud, samuti ei hõlma need omadusi, mida kasutatatakse põllumajanduses kaugseire rakendustes. Uurimus käsitleb biomassi ja saagikuse seni uurimata analüüsivõimalusi oluliste põllukultuuride ja viljelusmeetodite näitel. Hinnatakse ka kaugseirelahenduste potentsiaali põllupõhiste ja multifunktsionaalsete platvormide kasutamisel täppispõllumajanduses. Uurimus tutvustab kiiret, keskkonna suhtes kahjutut ja mõõduka hinnaga tehnoloogiat põllupõhise biomassi ja teraviljasaagi modelleerimiseks, et leida sobiv viljelusviis. Töö tulemused võimaldavad põllumajandustootjatel ja agronoomidel tõhusamalt valida põllundustehnoloogiaid ning arvestada täpsemalt keskkonnatingimustega.Publication of this thesis is supported by the Estonian University of Life Scieces and by the Doctoral School of Earth Sciences and Ecology created under the auspices of the European Social Fund
    • …
    corecore