A Rule Based Taxonomy of Dirty Data

., Jessie Kennedy; ., Lin Li; ., Taoxin Peng

A Rule Based Taxonomy of Dirty Data

Authors: Jessie Kennedy .
Lin Li .
Taoxin Peng .
Publication date: 13 September 2014
Publisher: GSTF Journal on Computing (JoC)

Abstract

There is a growing awareness that high quality of datais a key to today’s business success and that dirty data existingwithin data sources is one of the causes of poor data quality. Toensure high quality data, enterprises need to have a process,methodologies and resources to monitor, analyze and maintainthe quality of data. Nevertheless, research shows that manyenterprises do not pay adequate attention to the existence of dirtydata and have not applied useful methodologies to ensure highquality data for their applications. One of the reasons is a lack ofappreciation of the types and extent of dirty data. In practice,detecting and cleaning all the dirty data that exists in all datasources is quite expensive and unrealistic. The cost of cleaningdirty data needs to be considered for most of enterprises. Thisproblem has not attracted enough attention from researchers. Inthis paper, a rule-based taxonomy of dirty data is developed. Theproposed taxonomy not only provides a mechanism to deal withthis problem but also includes more dirty data types than any ofexisting such taxonomies

Similar works

Full text

Open in the Core reader

Download PDF

Available Versions

GSTF Digital Library (GSTF-DL): Open Journal Systems (Global Science and Technology Forum)

oai:ojs.dl6.globalstf.org:arti...

Last time updated on 18/12/2019