1,099 research outputs found

    A systematic review of data quality issues in knowledge discovery tasks

    Get PDF
    Hay un gran crecimiento en el volumen de datos porque las organizaciones capturan permanentemente la cantidad colectiva de datos para lograr un mejor proceso de toma de decisiones. El desafío mas fundamental es la exploración de los grandes volúmenes de datos y la extracción de conocimiento útil para futuras acciones por medio de tareas para el descubrimiento del conocimiento; sin embargo, muchos datos presentan mala calidad. Presentamos una revisión sistemática de los asuntos de calidad de datos en las áreas del descubrimiento de conocimiento y un estudio de caso aplicado a la enfermedad agrícola conocida como la roya del café.Large volume of data is growing because the organizations are continuously capturing the collective amount of data for better decision-making process. The most fundamental challenge is to explore the large volumes of data and extract useful knowledge for future actions through knowledge discovery tasks, nevertheless many data has poor quality. We presented a systematic review of the data quality issues in knowledge discovery tasks and a case study applied to agricultural disease named coffee rust

    HYEI: A New Hybrid Evolutionary Imperialist Competitive Algorithm for Fuzzy Knowledge Discovery

    Get PDF
    In recent years, imperialist competitive algorithm (ICA), genetic algorithm (GA), and hybrid fuzzy classification systems have been successfully and effectively employed for classification tasks of data mining. Due to overcoming the gaps related to ineffectiveness of current algorithms for analysing high-dimension independent datasets, a new hybrid approach, named HYEI, is presented to discover generic rule-based systems in this paper. This proposed approach consists of three stages and combines an evolutionary-based fuzzy system with two ICA procedures to generate high-quality fuzzy-classification rules. Initially, the best feature subset is selected by using the embedded ICA feature selection, and then these features are used to generate basic fuzzy-classification rules. Finally, all rules are optimized by using an ICA algorithm to reduce their length or to eliminate some of them. The performance of HYEI has been evaluated by using several benchmark datasets from the UCI machine learning repository. The classification accuracy attained by the proposed algorithm has the highest classification accuracy in 6 out of the 7 dataset problems and is comparative to the classification accuracy of the 5 other test problems, as compared to the best results previously published

    An Intelligent Genetic Algorithm for Mining Classification Rules in Large Datasets

    Get PDF
    Genetic algorithm is a popular classification algorithm which creates a random population of candidate solutions and makes them to evolve into a suitable accurate solution for a given problem by processing them iteratively for several generations. During each generation the training data set is accessed by the genetic algorithm only for the population member's fitness calculation and no other extra knowledge about the problem domain is extracted from the training data set. Even the domain knowledge stored in the chromosome code of the population may be lost in the future generations due to genetic operations. All the genetic operations like crossover and mutation are probability based and they do not depend upon the domain knowledge. This phenomenon makes the genetic algorithm to converge slowly. This paper proposes a genetic algorithm which tries to gain maximum knowledge in between the generations and store them in the form of knowledge chromosomes. The gained knowledge is used to make predictions about the search space and to guide the search process to an area with potential solutions in the subsequent generations. This makes the genetic algorithm to converge quickly which in turn reduces the learning cost. The experiments show that the run time is reduced considerably when compared with the state-of-the-art evolutionary algorithm

    Target detection with morphological shared-weight neural network : different update approaches

    Get PDF
    Neural networks are widely used for image processing. Of these, the convolutional neural network (CNN) is one of the most popular. However, the CNN needs a large amount of training data to improve its accuracy. If training data is limited, a morphological shared-weight neural network (MSNN) can be a better choice. In this thesis, two different update approaches based on an evolutionary algorithm are proposed and compared to each other for target detection based on the MSNN. Another network training, based on back propagation, is used for comparisons in this thesis, which was proposed by Yongwan Won and applied by my colleague and fellow graduate student, Shuxian Shen and Anes Ouadou. Single-layer and multiple-layer MSNNs are both presented with different approaches. For a dataset, the author created part of a dataset for this thesis and used another dataset created by Shen to make comparisons with her network. Results of the MSNN are compared with CNN results to show the performance. Experiments show that for a single-layer MSNN, the performance of an evolutionary algorithm with partial backpropagation is the best. For a multiple layer MSNN, backpropagation performs better, although the MSNN still has a better performance than the CNN.Includes bibliographical reference

    Computational intelligence techniques for data analysis

    Get PDF
    Ph.DDOCTOR OF PHILOSOPH
    corecore