10,047 research outputs found

    A systematic review of data quality issues in knowledge discovery tasks

    Get PDF
    Hay un gran crecimiento en el volumen de datos porque las organizaciones capturan permanentemente la cantidad colectiva de datos para lograr un mejor proceso de toma de decisiones. El desafío mas fundamental es la exploración de los grandes volúmenes de datos y la extracción de conocimiento útil para futuras acciones por medio de tareas para el descubrimiento del conocimiento; sin embargo, muchos datos presentan mala calidad. Presentamos una revisión sistemática de los asuntos de calidad de datos en las áreas del descubrimiento de conocimiento y un estudio de caso aplicado a la enfermedad agrícola conocida como la roya del café.Large volume of data is growing because the organizations are continuously capturing the collective amount of data for better decision-making process. The most fundamental challenge is to explore the large volumes of data and extract useful knowledge for future actions through knowledge discovery tasks, nevertheless many data has poor quality. We presented a systematic review of the data quality issues in knowledge discovery tasks and a case study applied to agricultural disease named coffee rust

    Irrelevant feature and rule removal for structural associative classification

    Get PDF
    In the classification task, the presence of irrelevant features can significantly degrade the performance of classification algorithms,in terms of additional processing time, more complex models and the likelihood that the models have poor generalization power due to the over fitting problem.Practical applications of association rule mining often suffer from overwhelming number of rules that are generated, many of which are not interesting or not useful for the application in question.Removing rules comprised of irrelevant features can significantly improve the overall performance.In this paper, we explore and compare the use of a feature selection measure to filter out unnecessary and irrelevant features/attributes prior to association rules generation.The experiments are performed using a number of real-world datasets that represent diverse characteristics of data items.Empirical results confirm that by utilizing feature subset selection prior to association rule generation, a large number of rules with irrelevant features can be eliminated.More importantly, the results reveal that removing rules that hold irrelevant features improve the accuracy rate and capability to retain the rule coverage rate of structural associative association
    corecore