Search CORE

47 research outputs found

A systematic review of data quality issues in knowledge discovery tasks

Author: Corrales David Camilo
Corrales Juan Carlos
Ledezma Agapito Ismael
Publication venue: 'Universidad de Medellin'
Publication date: 07/11/2015
Field of study

Hay un gran crecimiento en el volumen de datos porque las organizaciones capturan permanentemente la cantidad colectiva de datos para lograr un mejor proceso de toma de decisiones. El desafío mas fundamental es la exploración de los grandes volúmenes de datos y la extracción de conocimiento útil para futuras acciones por medio de tareas para el descubrimiento del conocimiento; sin embargo, muchos datos presentan mala calidad. Presentamos una revisión sistemática de los asuntos de calidad de datos en las áreas del descubrimiento de conocimiento y un estudio de caso aplicado a la enfermedad agrícola conocida como la roya del café.Large volume of data is growing because the organizations are continuously capturing the collective amount of data for better decision-making process. The most fundamental challenge is to explore the large volumes of data and extract useful knowledge for future actions through knowledge discovery tasks, nevertheless many data has poor quality. We presented a systematic review of the data quality issues in knowledge discovery tasks and a case study applied to agricultural disease named coffee rust

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Crossref

Universidad de Medellín: Revistas Científicas

Repositorio Institucional Universidad de Medellín

DIALNET

Analysis and enhancement of the denoising depth data using kinect through iterative technique

Author: Aghababaeyan Reza
Bhatti Zeeshan
Bilal Sara Mohammed Osman Saleh
Karbasi Mostafa
Rad Abdolvahab Ehsani
Shah Asadullah
Publication venue: 'Penerbit UTM Press'
Publication date: 28/08/2016
Field of study

Since the release of Kinect by Microsoft, the, accuracy and stability of Kinect data-such as depth map, has been essential and important element of research and data analysis. In order to develop efficient means of analyzing and using the kinnect data, researchers require high quality of depth data during the preprocessing step, which is very crucial for accurate results. One of the most important concerns of researchers is to eliminate image noise and convert image and video to the best quality. In this paper, different types of the noise for Kinect are analyzed and a unique technique is used, to reduce the background noise based on distance between Kinect devise and the user. Whereas, for shadow removal, the iterative method is used to eliminate the shadow casted by the Kinect. A 3D depth image is obtained as a result with good quality and accuracy. Further, the results of this present study reveal that the image background is eliminated completely and the 3D image quality in depth map has been enhanced

Crossref

The International Islamic University Malaysia Repository

How to Address the Data Quality Issues in Regression Models: A Guided Process for Data Cleaning

Author: Corrales Juan Carlos
Ledezma Espino Agapito Ismael
Ledezma Espino Agapito Ismael
Publication venue: 'MDPI AG'
Publication date: 01/01/2018
Field of study

Today, data availability has gone from scarce to superabundant. Technologies like IoT, trends in social media and the capabilities of smart-phones are producing and digitizing lots of data that was previously unavailable. This massive increase of data creates opportunities to gain new business models, but also demands new techniques and methods of data quality in knowledge discovery, especially when the data comes from different sources (e.g., sensors, social networks, cameras, etc.). The data quality process of the data set proposes conclusions about the information they contain. This is increasingly done with the aid of data cleaning approaches. Therefore, guaranteeing a high data quality is considered as the primary goal of the data scientist. In this paper, we propose a process for data cleaning in regression models (DC-RM). The proposed data cleaning process is evaluated through a real datasets coming from the UCI Repository of Machine Learning Databases. With the aim of assessing the data cleaning process, the dataset that is cleaned by DC-RM was used to train the same regression models proposed by the authors of UCI datasets. The results achieved by the trained models with the dataset produced by DC-RM are better than or equal to that presented by the datasets' authors.This work has been also supported by the Spanish Ministry of Economy, Industry and Competitiveness (Projects TRA2015-63708-R and TRA2016-78886-C3-1-R)

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Universidad Carlos III de Madrid e-Archivo

A Lightweight Data Preprocessing Strategy with Fast Contradiction Analysis for Incremental Classifier Learning

Author: Bee Wah Yap
Robert P. Biuk-Aghai
Simon Fong
Yain-whar Si
Publication venue: 'Hindawi Limited'
Publication date: 01/01/2015
Field of study

A prime objective in constructing data streaming mining models is to achieve good accuracy, fast learning, and robustness to noise. Although many techniques have been proposed in the past, efforts to improve the accuracy of classification models have been somewhat disparate. These techniques include, but are not limited to, feature selection, dimensionality reduction, and the removal of noise from training data. One limitation common to all of these techniques is the assumption that the full training dataset must be applied. Although this has been effective for traditional batch training, it may not be practical for incremental classifier learning, also known as data stream mining, where only a single pass of the data stream is seen at a time. Because data streams can amount to infinity and the so-called big data phenomenon, the data preprocessing time must be kept to a minimum. This paper introduces a new data preprocessing strategy suitable for the progressive purging of noisy data from the training dataset without the need to process the whole dataset at one time. This strategy is shown via a computer simulation to provide the significant benefit of allowing for the dynamic removal of bad records from the incremental classifier learning process

Crossref

Directory of Open Access Journals