Skip to main content
Article thumbnail
Location of Repository

Regularities in Learning Defect Predictors

By Burak Turhan, Ayse Bener and Tim Menzies


Collecting large consistent data sets for real world software projects is problematic. Therefore, we explore how little data are required before the predictor performance plateaus; i.e. further data do not improve the performance score. In this case study, we explore three embedded controller software, two versions of an open source anti-virus software (Clam AV) and a subset of bugs in two versions of GNU gcc compiler, to mark the regularities in learning predictors for different domains. We show that only a small number of bug reports, around 30, is required to learn stable defect predictors. Further, we show that these bug reports need not necessarily come from the local projects. We demonstrate that using imported data from different sites can make it suitable for learning defects at the local site. Our conclusion is that software construction is a surprisingly uniform endeavor with simple and repeated patterns that can be discovered in local or imported data using just a handful of examples. 1

Year: 2013
OAI identifier: oai:CiteSeerX.psu:
Provided by: CiteSeerX
Download PDF:
Sorry, we are unable to provide the full text but you may find it at the following location(s):
  • (external link)
  • (external link)
  • Suggested articles

    To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.