1 research outputs found
A Taxonomy of Data Quality Challenges in Empirical Software Engineering
Reliable empirical models such as those used in software effort estimation or
defect prediction are inherently dependent on the data from which they are
built. As demands for process and product improvement continue to grow, the
quality of the data used in measurement and prediction systems warrants
increasingly close scrutiny. In this paper we propose a taxonomy of data
quality challenges in empirical software engineering, based on an extensive
review of prior research. We consider current assessment techniques for each
quality issue and proposed mechanisms to address these issues, where available.
Our taxonomy classifies data quality issues into three broad areas: first,
characteristics of data that mean they are not fit for modeling; second, data
set characteristics that lead to concerns about the suitability of applying a
given model to another data set; and third, factors that prevent or limit data
accessibility and trust. We identify this latter area as of particular need in
terms of further research.Comment: Conference paper, 12 pages, 6 figure