1,184 research outputs found
An Innovative Approach for Predicting Software Defects by Handling Class Imbalance Problem
From last decade unbalanced data has gained attention as a major challenge for enhancing software quality and reliability. Due to evolution in advanced software development tools and processes, today’s developed software product is much larger and complicated in nature. The software business faces a major issue in maintaining software performance and efficiency as well as cost of handling software issues after deployment of software product. The effectiveness of defect prediction model has been hampered by unbalanced data in terms of data analysis, biased result, model accuracy and decision making. Predicting defects before they affect your software product is one way to cut costs required to maintain software quality. In this study we are proposing model using two level approach for class imbalance problem which will enhance accuracy of prediction model. In the first level, model will balance predictive class at data level by applying sampling method. Second level we will use Random Forest machine learning approach which will create strong classifier for software defect. Hence, we can enhance software defect prediction model accuracy by handling class imbalance issue at data and algorithm level
The impact of using biased performance metrics on software defect prediction research
Context: Software engineering researchers have undertaken many experiments
investigating the potential of software defect prediction algorithms.
Unfortunately, some widely used performance metrics are known to be
problematic, most notably F1, but nevertheless F1 is widely used.
Objective: To investigate the potential impact of using F1 on the validity of
this large body of research.
Method: We undertook a systematic review to locate relevant experiments and
then extract all pairwise comparisons of defect prediction performance using F1
and the un-biased Matthews correlation coefficient (MCC).
Results: We found a total of 38 primary studies. These contain 12,471 pairs
of results. Of these, 21.95% changed direction when the MCC metric is used
instead of the biased F1 metric. Unfortunately, we also found evidence
suggesting that F1 remains widely used in software defect prediction research.
Conclusions: We reiterate the concerns of statisticians that the F1 is a
problematic metric outside of an information retrieval context, since we are
concerned about both classes (defect-prone and not defect-prone units). This
inappropriate usage has led to a substantial number (more than one fifth) of
erroneous (in terms of direction) results. Therefore we urge researchers to (i)
use an unbiased metric and (ii) publish detailed results including confusion
matrices such that alternative analyses become possible.Comment: Submitted to the journal Information & Software Technology. It is a
greatly extended version of "Assessing Software Defection Prediction
Performance: Why Using the Matthews Correlation Coefficient Matters"
presented at EASE 202
Evaluating defect prediction approaches: a benchmark and an extensive comparison
Reliably predicting software defects is one of the holy grails of software engineering. Researchers have devised and implemented a plethora of defect/bug prediction approaches varying in terms of accuracy, complexity and the input data they require. However, the absence of an established benchmark makes it hard, if not impossible, to compare approaches. We present a benchmark for defect prediction, in the form of a publicly available dataset consisting of several software systems, and provide an extensive comparison of well-known bug prediction approaches, together with novel approaches we devised. We evaluate the performance of the approaches using different performance indicators: classification of entities as defect-prone or not, ranking of the entities, with and without taking into account the effort to review an entity. We performed three sets of experiments aimed at (1) comparing the approaches across different systems, (2) testing whether the differences in performance are statistically significant, and (3) investigating the stability of approaches across different learners. Our results indicate that, while some approaches perform better than others in a statistically significant manner, external validity in defect prediction is still an open problem, as generalizing results to different contexts/learners proved to be a partially unsuccessful endeavo
- …