5 research outputs found
Issues with SZZ: An empirical assessment of the state of practice of defect prediction data collection
Defect prediction research has a strong reliance on published data sets that
are shared between researchers. The SZZ algorithm is the de facto standard for
collecting defect labels for this kind of data and is used by most public data
sets. Thus, problems with the SZZ algorithm may have a strong indirect impact
on almost the complete state of the art of defect prediction. Recent research
uncovered potential problems in different parts of the SZZ algorithm. Within
this article, we provide an extensive empirical analysis of the defect labels
created with the SZZ algorithm. We used a combination of manual validation and
adopted or improved heuristics for the collection of defect data to establish
ground truth data for bug fixing commits, improved the heuristic for the
identification of inducing changes for defects, as well as the assignment of
bugs to releases. We conducted an empirical study on 398 releases of 38 Apache
projects and found that only half of the bug fixing commits determined by SZZ
are actually bug fixing. Moreover, if a six month time frame is used in
combination with SZZ to determine which bugs affect a release, one file is
incorrectly labeled as defective for every file that is correctly labeled as
defective. In addition, two defective files are missed. We also explored the
impact of the relatively small set of features that are available in most
defect prediction data sets, as there are multiple publications that indicate
that, e.g., churn related features are important for defect prediction. We
found that the difference of using more features is negligible.Comment: Submitted and under review. First three authors are equally
contributin