Collecting large consistent data sets for real world software projects is problematic. Therefore, we explore how little data are required before the predictor performance plateaus; i.e. further data do not improve the performance score. In this case study, we explore three embedded controller software, two versions of an open source anti-virus software (Clam AV) and a subset of bugs in two versions of GNU gcc compiler, to mark the regularities in learning predictors for different domains. We show that only a small number of bug reports, around 30, is required to learn stable defect predictors. Further, we show that these bug reports need not necessarily come from the local projects. We demonstrate that using imported data from different sites can make it suitable for learning defects at the local site. Our conclusion is that software construction is a surprisingly uniform endeavor with simple and repeated patterns that can be discovered in local or imported data using just a handful of examples. 1
To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.