6 research outputs found

    Application of spatio-temporal data in site-specific maize yield prediction with machine learning methods

    Get PDF
    In order to meet the requirements of sustainability and to determine yield drivers and limiting factors, it is now more likely that traditional yield modelling will be carried out using artificial intelligence (AI). The aim of this study was to predict maize yields using AI that uses spatio-temporal training data. The paper has advanced a new method of maize yield prediction, which is based on spatio-temporal data mining. To find the best solution, various models were used: counter-propagation artificial neural networks (CP-ANNs), XY-fused Querynetworks (XY-Fs), supervised Kohonen networks (SKNs), neural networks with Rectangular Linear Activations (ReLU), extreme gradient boosting (XGBoost), support-vector machine (SVM), and different subsets of the independent variables in five vegetation periods. Input variables for modelling included: soil parameters (pH, P2O5, K2O, Zn, clay content, ECa, draught force, Cone index), micro-relief averages, and meteorological parameters for the 63 treatment units in a 15.3 ha research field. The best performing method (XGBoost) reached 92.1% and 95.3% accuracy on the training and the test sets. Additionally, a novel method was introduced to treat individual units in a lattice system. The lattice-based smoothing performed an additional increase in Area under the curve (AUC) to 97.5% over the individual predictions of the XGBoost model. The models were developed using 48 different subsets of variables to determine which variables consistently contributed to prediction accuracy. By comparing the resulting models, it was shown that the best regression model was Extreme Gradient Boosting Trees, with 92.1% accuracy (on the training set). In addition, the method calculates the influence of the spatial distribution of site-specific soil fertility on maize grain yields. This paper provides a new method of spatio-temporal data analyses, taking the most important influencing factors on maize yields into account

    Gaussian Perturbations in ReLU Networks and the Arrangement of Activation Regions

    Get PDF
    Recent articles indicate that deep neural networks are efficient models for various learning problems. However, they are often highly sensitive to various changes that cannot be detected by an independent observer. As our understanding of deep neural networks with traditional generalisation bounds still remains incomplete, there are several measures which capture the behaviour of the model in case of small changes at a specific state. In this paper we consider Gaussian perturbations in the tangent space and suggest tangent sensitivity in order to characterise the stability of gradient updates. We focus on a particular kind of stability with respect to changes in parameters that are induced by individual examples without known labels. We derive several easily computable bounds and empirical measures for feed-forward fully connected ReLU (Rectified Linear Unit) networks and connect tangent sensitivity to the distribution of the activation regions in the input space realised by the networ

    Gradient representations in ReLU networks as similarity functions

    No full text
    Feed-forward networks can be interpreted as mappings with linear decision surfaces at the level of the last layer. We investigate how the tangent space of the network can be exploited to refine the decision in case of ReLU (Rectangular Linear Unit) activations. We show that a simple Riemannian metric parametrized on the parameters of the network forms a similarity function at least as good as the original network and we suggest a sparse metric to increase the similarity gap

    Spatial Variability of Soil Properties and Its Effect on Maize Yields within Field—A Case Study in Hungary

    Get PDF
    To better understand the potential of soils, understanding how soil properties vary over time and in-field is essential to optimize the cultivation and site-specific technologies in crop production. This article aimed at determining the within-field mapping of soil chemical and physical properties, vegetation index, and yield of maize in 2002, 2006, 2010, 2013, and 2017, respectively. The objectives of this five-year field study were: (i) to assess the spatial and temporal variability of attributes related to the maize yield; and (ii) to analyse the temporal stability of management zones. The experiment was carried out in a 15.3 ha research field in Hungary. The soil measurements included sand, silt, clay content (%), pH, phosphorous (P2O5), potassium (K2O), and zinc (Zn) in the topsoil (30 cm). The apparent soil electrical conductivity was measured in two layers (0–30 cm and 30–90 cm, mS/m) in 2010, in 2013, and in 2017. The soil properties and maize yields were evaluated in 62 management zones, covering the whole research area. The properties were characterized as the spatial-temporal variability of these parameters and crop yields. Classic statistics and geostatistics were used to analyze the results. The maize yields were significantly positively correlated (r = 0.62–0.73) with the apparent electrical conductivity (Veris_N3, Veris_N4) in 2013 and 2017, and with clay content (r = 0.56–0.81) in 2002, 2013, and 2017

    Assessment of Associations Between Serum Lipoprotein (a) Levels and Atherosclerotic Vascular Diseases in Hungarian Patients With Familial Hypercholesterolemia Using Data Mining and Machine Learning

    Get PDF
    Background and aims: Premature mortality due to atherosclerotic vascular disease is very high in Hungary in comparison with international prevalence rates, though the estimated prevalence of familial hypercholesterolemia (FH) is in line with the data of other European countries. Previous studies have shown that high lipoprotein(a)- Lp(a) levels are associated with an increased risk of atherosclerotic vascular diseases in patients with FH. We aimed to assess the associations of serum Lp(a) levels and such vascular diseases in FH using data mining methods and machine learning techniques in the Northern Great Plain region of Hungary. Methods: Medical records of 590,500 patients were included in our study. Based on the data from previously diagnosed FH patients using the Dutch Lipid Clinic Network scores (≥7 was evaluated as probable or definite FH), we trained machine learning models to identify FH patients. Results: We identified 459 patients with FH and 221 of them had data available on Lp(a). Patients with FH had significantly higher Lp(a) levels compared to non-FH subjects [236 (92.5; 698.5) vs. 167 (80.2; 431.5) mg/L, p 500 mg/L. Atherosclerotic complications were significantly more frequent in FH patients compared to patients without FH (46.6 vs. 13.9%). However, contrary to several other previous studies, we could not find significant associations between serum Lp(a) levels and atherosclerotic vascular diseases in the studied Hungarian FH patient group. Conclusion: The extremely high burden of vascular disease is mainly explained by the unhealthy lifestyle of our patients (i.e., high prevalence of smoking, unhealthy diet and physical inactivity resulting in obesity and hypertension). The lack of associations between serum Lp(a) levels and atherosclerotic vascular diseases in Hungarian FH patients may be due to the high prevalence of these risk factors, that mask the deleterious effect of Lp(a)

    Identifying Patients with Familial Chylomicronemia Syndrome Using FCS Score-Based Data Mining Methods

    Get PDF
    Background: There are no exact data about the prevalence of familial chylomicronemia syndrome (FCS) in Central Europe. We aimed to identify FCS patients using either the FCS score proposed by Moulin et al. or with data mining and assessed the diagnostic applicability of the FCS score. Methods: Analyzing medical records of 1,342,124 patients, the FCS score of each patient was calculated. Based on the data of previously diagnosed FCS patients, we trained machine learning models to identify other features that may improve FCS score calculation. Results: We identified 26 patients with an FCS score of ≥10. From the trained models, boosting tree models and support vector machines performed the best for patient recognition with overall AUC above 0.95, while artificial neural networks accomplished above 0.8, indicating less efficacy. We identified laboratory features that can be considered as additions to the FCS score calculation. Conclusions: The estimated prevalence of FCS was 19.4 per million in our region, which exceeds the prevalence data of other European countries. Analysis of larger regional and country-wide data might increase the number of FCS cases. Although FCS score is an excellent tool in identifying potential FCS patients, consideration of some other features may improve its accuracy
    corecore