5,375 research outputs found

    A systematic review of data quality issues in knowledge discovery tasks

    Get PDF
    Hay un gran crecimiento en el volumen de datos porque las organizaciones capturan permanentemente la cantidad colectiva de datos para lograr un mejor proceso de toma de decisiones. El desafío mas fundamental es la exploración de los grandes volúmenes de datos y la extracción de conocimiento útil para futuras acciones por medio de tareas para el descubrimiento del conocimiento; sin embargo, muchos datos presentan mala calidad. Presentamos una revisión sistemática de los asuntos de calidad de datos en las áreas del descubrimiento de conocimiento y un estudio de caso aplicado a la enfermedad agrícola conocida como la roya del café.Large volume of data is growing because the organizations are continuously capturing the collective amount of data for better decision-making process. The most fundamental challenge is to explore the large volumes of data and extract useful knowledge for future actions through knowledge discovery tasks, nevertheless many data has poor quality. We presented a systematic review of the data quality issues in knowledge discovery tasks and a case study applied to agricultural disease named coffee rust

    Temporal - spatial recognizer for multi-label data

    Get PDF
    Pattern recognition is an important artificial intelligence task with practical applications in many fields such as medical and species distribution. Such application involves overlapping data points which are demonstrated in the multi- label dataset. Hence, there is a need for a recognition algorithm that can separate the overlapping data points in order to recognize the correct pattern. Existing recognition methods suffer from sensitivity to noise and overlapping points as they could not recognize a pattern when there is a shift in the position of the data points. Furthermore, the methods do not implicate temporal information in the process of recognition, which leads to low quality of data clustering. In this study, an improved pattern recognition method based on Hierarchical Temporal Memory (HTM) is proposed to solve the overlapping in data points of multi- label dataset. The imHTM (Improved HTM) method includes improvement in two of its components; feature extraction and data clustering. The first improvement is realized as TS-Layer Neocognitron algorithm which solves the shift in position problem in feature extraction phase. On the other hand, the data clustering step, has two improvements, TFCM and cFCM (TFCM with limit- Chebyshev distance metric) that allows the overlapped data points which occur in patterns to be separated correctly into the relevant clusters by temporal clustering. Experiments on five datasets were conducted to compare the proposed method (imHTM) against statistical, template and structural pattern recognition methods. The results showed that the percentage of success in recognition accuracy is 99% as compared with the template matching method (Featured-Based Approach, Area-Based Approach), statistical method (Principal Component Analysis, Linear Discriminant Analysis, Support Vector Machines and Neural Network) and structural method (original HTM). The findings indicate that the improved HTM can give an optimum pattern recognition accuracy, especially the ones in multi- label dataset

    AI-enhanced diagnosis of challenging lesions in breast MRI: a methodology and application primer

    Get PDF
    Computer-aided diagnosis (CAD) systems have become an important tool in the assessment of breast tumors with magnetic resonance imaging (MRI). CAD systems can be used for the detection and diagnosis of breast tumors as a “second opinion” review complementing the radiologist’s review. CAD systems have many common parts such as image pre-processing, tumor feature extraction and data classification that are mostly based on machine learning (ML) techniques. In this review paper, we describe the application of ML-based CAD systems in MRI of the breast covering the detection of diagnostically challenging lesions such as non-mass enhancing (NME) lesions, multiparametric MRI, neo-adjuvant chemotherapy (NAC) and radiomics all applied to NME. Since ML has been widely used in the medical imaging community, we provide an overview about the state-ofthe-art and novel techniques applied as classifiers to CAD systems. The differences in the CAD systems in MRI of the breast for several standard and novel applications for NME are explained in detail to provide important examples illustrating: (i) CAD for the detection and diagnosis, (ii) CAD in multi-parametric imaging (iii) CAD in NAC and (iv) breast cancer radiomics. We aim to provide a comparison between these CAD applications and to illustrate a global view on intelligent CAD systems based on ANN in MRI of the breast

    Data mining using intelligent systems : an optimized weighted fuzzy decision tree approach

    Get PDF
    Data mining can be said to have the aim to analyze the observational datasets to find relationships and to present the data in ways that are both understandable and useful. In this thesis, some existing intelligent systems techniques such as Self-Organizing Map, Fuzzy C-means and decision tree are used to analyze several datasets. The techniques are used to provide flexible information processing capability for handling real-life situations. This thesis is concerned with the design, implementation, testing and application of these techniques to those datasets. The thesis also introduces a hybrid intelligent systems technique: Optimized Weighted Fuzzy Decision Tree (OWFDT) with the aim of improving Fuzzy Decision Trees (FDT) and solving practical problems. This thesis first proposes an optimized weighted fuzzy decision tree, incorporating the introduction of Fuzzy C-Means to fuzzify the input instances but keeping the expected labels crisp. This leads to a different output layer activation function and weight connection in the neural network (NN) structure obtained by mapping the FDT to the NN. A momentum term was also introduced into the learning process to train the weight connections to avoid oscillation or divergence. A new reasoning mechanism has been also proposed to combine the constructed tree with those weights which had been optimized in the learning process. This thesis also makes a comparison between the OWFDT and two benchmark algorithms, Fuzzy ID3 and weighted FDT. SIx datasets ranging from material science to medical and civil engineering were introduced as case study applications. These datasets involve classification of composite material failure mechanism, classification of electrocorticography (ECoG)/Electroencephalogram (EEG) signals, eye bacteria prediction and wave overtopping prediction. Different intelligent systems techniques were used to cluster the patterns and predict the classes although OWFDT was used to design classifiers for all the datasets. In the material dataset, Self-Organizing Map and Fuzzy C-Means were used to cluster the acoustic event signals and classify those events to different failure mechanism, after the classification, OWFDT was introduced to design a classifier in an attempt to classify acoustic event signals. For the eye bacteria dataset, we use the bagging technique to improve the classification accuracy of Multilayer Perceptrons and Decision Trees. Bootstrap aggregating (bagging) to Decision Tree also helped to select those most important sensors (features) so that the dimension of the data could be reduced. Those features which were most important were used to grow the OWFDT and the curse of dimensionality problem could be solved using this approach. The last dataset, which is concerned with wave overtopping, was used to benchmark OWFDT with some other Intelligent Systems techniques, such as Adaptive Neuro-Fuzzy Inference System (ANFIS), Evolving Fuzzy Neural Network (EFuNN), Genetic Neural Mathematical Method (GNMM) and Fuzzy ARTMAP. Through analyzing these datasets using these Intelligent Systems Techniques, it has been shown that patterns and classes can be found or can be classified through combining those techniques together. OWFDT has also demonstrated its efficiency and effectiveness as compared with a conventional fuzzy Decision Tree and weighted fuzzy Decision Tree

    A Fuzzy Rule Based Approach to Geographic Classification of Virgin Olive Oil Using T-Operators

    Get PDF
    Olive oil is an important agricultural food product. Especially, protected designation of origin (PDO) and protected geographic indications (PGI) are useful to protect the intellectual property rights of the consumers and producers. For this reason, the importance of the geographic classification increases to trace geographical indications. This chapter suggests a geographical classification system for the virgin olive oils. This system is formed on chemical parameters. These parameters include fuzziness. Novel proposed system constructs the rules by using fuzzy decision tree algorithm. It produces rules over fuzzy ID3 algorithm. It uses fuzzy entropy on the fuzzified data. The reasoning procedure depends on weighted rule-based system and is adapted into the fuzzy reasoning handled with different T-operators. Fuzzification is performed with fuzzy c-means algorithm for the olive oil data set. The cluster numbers of each variable are selected based on partition coefficient validity criteria. The model is examined by using different decision tree approaches (C4.5 and standard version fuzzy ID3 algorithm) and FID3 reasoning method with eight different T-operators. Also, the conclusions are supported by statistical analysis. Experimental results support that the weights have important manner on fuzzy reasoning method for the geographic classification system
    corecore