484 research outputs found

    Computational intelligence & modeling of crop disease data in Africa

    Get PDF
    The thesis presents the application of machine learning techniques to solve a real world challenge related to pest and disease control in the agricultural sector. The research is divided into three areas:i). We developed algorithms to auto-diagnose diseases in crops using an image dataset captured with a mobile phone camera. The study looked into disease incidence and severity measurements from cassava leaf images. We applied computer vision techniques to extract visual features of color and shape combined with classification techniques.(ii). We investigated on the diagnosis of disease in crops before they become symptomatic by use of spectrograms. The experiments of this study involved growing cassava plants in a screen house where they were inoculated with disease viruses and we monitored the plants over time collecting both spectral and plant tissue for wet chemistry analysis at each time step until the plants show disease. Our models in our case GMLVQ were able to detect cassava diseases one week after virus infection can be confirmed by wet lab chemistry, but several weeks before symptoms manifest on the plants.(iii). We investigated on the development of a low-cost 3-D printed smartphone add-on spectrometer that can be used to diagnose crop diseases in the fields. Moving from a commercial spectrometer (1000 USD), the study presented a tool that should be cheap (less than 5 USD ) and usable by smallholder farmers, thus improving their livelihoods through increased crop yields and food security

    Adaptive imputation of missing values for incomplete pattern classification

    Get PDF
    In classification of incomplete pattern, the missing values can either play a crucial role in the class determination, or have only little influence (or eventually none) on the classification results according to the context. We propose a credal classification method for incomplete pattern with adaptive imputation of missing values based on belief function theory. At first, we try to classify the object (incomplete pattern) based only on the available attribute values. As underlying principle, we assume that the missing information is not crucial for the classification if a specific class for the object can be found using only the available information. In this case, the object is committed to this particular class. However, if the object cannot be classified without ambiguity, it means that the missing values play a main role for achieving an accurate classification. In this case, the missing values will be imputed based on the K-nearest neighbor (K-NN) and self-organizing map (SOM) techniques, and the edited pattern with the imputation is then classified. The (original or edited) pattern is respectively classified according to each training class, and the classification results represented by basic belief assignments are fused with proper combination rules for making the credal classification. The object is allowed to belong with different masses of belief to the specific classes and meta-classes (which are particular disjunctions of several single classes). The credal classification captures well the uncertainty and imprecision of classification, and reduces effectively the rate of misclassifications thanks to the introduction of meta-classes. The effectiveness of the proposed method with respect to other classical methods is demonstrated based on several experiments using artificial and real data sets

    Early detection of plant diseases using spectral data

    Get PDF
    Early detection of crop disease is an essential step in food security. Usually, the detection becomes possible in a stage where disease symptoms are already visible on the aerial part of the plant. However, once the disease has manifested in different parts of the plant, little can be done to salvage the situation. Here, we suggest that the use of visible and near infrared spectral information facilitates disease detection in cassava crops before symptoms can be seen by the human eye. To test this hypothesis, we grow cassava plants in a screen house where they are inoculated with disease viruses. We monitor the plants over time collecting both spectra and plant tissue for wet chemistry analysis. Our results demonstrate that suitably trained classifiers are indeed able to detect cassava diseases. Specifically, we consider Generalized Matrix Relevance Learning Vector Quantization (GMLVQ) applied to original spectra and, alternatively, in combination with dimension reduction by Principal Component Analysis (PCA). We show that successful detection is possible shortly after the infection can be confirmed by wet lab chemistry, several weeks before symptoms manifest on the plants

    Techniques for data pattern selection and abstraction

    Get PDF
    This thesis concerns the problem of prototype reduction in instance-based learning. In order to deal with problems such as storage requirements, sensitivity to noise and computational complexity, various algorithms have been presented that condense the number of stored prototypes, while maintaining competent classification accuracy. Instance selection, which recovers a smaller subset of the original training set, is the most widely used technique for instance reduction. But, prototype abstraction that generates new prototypes to replace the initial ones has also gained a lot of interest recently. The major contribution of this work is the proposal of four novel frameworks for performing prototype reduction, the Class Boundary Preserving algorithm (CBP), a hybrid method that uses both selection and generation of prototypes, Instance Seriation for Prototype Abstraction (ISPA), which is an abstraction algorithm, and two selective techniques, Spectral Instance Reduction (SIR) and Direct Weight Optimization (DWO). CBP is a multi-stage method based on a simple heuristic that is very effective in identifying samples close to class borders. Using a noise filter harmful instances are removed, while the powerful heuristic determines the geometrical distribution of patterns around every instance. Together with the concepts of nearest enemy pairs and mean shift clustering this algorithm decides on the final set of retained prototypes. DWO is a selection model whose output set of prototypes is decided by a set of binary weights. These weights are computed according to an objective function composed of the ratio between the nearest friend and nearest enemy of every sample. In order to obtain good quality results DWO is optimized using a genetic algorithm. ISPA is an abstraction technique that employs the concept of data seriation to organize instances in an arrangement that favours merging between them. As a result, a new set of prototypes is created. Results show that CBP, SIR and DWO, the three major algorithms presented in this thesis, are competent and efficient in terms of at least one of the two basic objectives, classification accuracy and condensation ratio. The comparison against other successful condensation algorithms illustrates the competitiveness of the proposed models. The SIR algorithm presents a set of border discriminating features (BDFs) that depicts the local distribution of friends and enemies of all samples. These are then used along with spectral graph theory to partition the training set in to border and internal instances

    Rejection and online learning with prototype-based classifiers in adaptive metrical spaces

    Get PDF
    Fischer L. Rejection and online learning with prototype-based classifiers in adaptive metrical spaces. Bielefeld: Universität Bielefeld; 2016.The rising amount of digital data, which is available in almost every domain, causes the need for intelligent, automated data processing. Classification models constitute particularly popular techniques from the machine learning domain with applications ranging from fraud detection up to advanced image classification tasks. Within this thesis, we will focus on so-called prototype-based classifiers as one prominent family of classifiers, since they offer a simple classification scheme, interpretability of the model in terms of prototypes, and good generalisation performance. We will face a few crucial questions which arise whenever such classifiers are used in real-life scenarios which require robustness and reliability of classification and the ability to deal with complex and possibly streaming data sets. Particularly, we will address the following problems: - Deterministic prototype-based classifiers deliver a class label, but no confidence of the classification. The latter is particularly relevant whenever the costs of an error are higher than the costs to reject an example, e.g. in a safety critical system. We investigate ways to enhance prototype-based classifiers by a certainty measure which can efficiently be computed based on the given classifier only and which can be used to reject an unclear classification. - For an efficient rejection, the choice of a suitable threshold is crucial. We investigate in which situations the performance of local rejection can surpass the choice of only a global one, and we propose efficient schemes how to optimally compute local thresholds on a given training set. - For complex data and lifelong learning, the required classifier complexity can be unknown a priori. We propose an efficient, incremental scheme which adjusts the model complexity of a prototype-based classifier based on the certainty of the classification. Thereby, we put particular emphasis on the question how to adjust prototype locations and metric parameters, and how to insert and/or delete prototypes in an efficient way. - As an alternative to the previous solution, we investigate a hybrid architecture which combines an offline classifier with an online classifier based on their certainty values, thus directly addressing the stability/plasticity dilemma. While this is straightforward for classical prototype-based schemes, it poses some challenges as soon as metric learning is integrated into the scheme due to the different inherent data representations. - Finally, we investigate the performance of the proposed hybrid prototype-based classifier within a realistic visual road-terrain-detection scenario

    Interpretable Models Capable of Handling Systematic Missingness in Imbalanced Classes and Heterogeneous Datasets

    Get PDF
    Application of interpretable machine learning techniques on medical datasets facilitate early and fast diagnoses, along with getting deeper insight into the data. Furthermore, the transparency of these models increase trust among application domain experts. Medical datasets face common issues such as heterogeneous measurements, imbalanced classes with limited sample size, and missing data, which hinder the straightforward application of machine learning techniques. In this paper we present a family of prototype-based (PB) interpretable models which are capable of handling these issues. The models introduced in this contribution show comparable or superior performance to alternative techniques applicable in such situations. However, unlike ensemble based models, which have to compromise on easy interpretation, the PB models here do not. Moreover we propose a strategy of harnessing the power of ensembles while maintaining the intrinsic interpretability of the PB models, by averaging the model parameter manifolds. All the models were evaluated on a synthetic (publicly available dataset) in addition to detailed analyses of two real-world medical datasets (one publicly available). Results indicated that the models and strategies we introduced addressed the challenges of real-world medical data, while remaining computationally inexpensive and transparent, as well as similar or superior in performance compared to their alternatives
    • …
    corecore