32,326 research outputs found
Software defect prediction: do different classifiers find the same defects?
Open Access: This article is distributed under the terms of the Creative Commons Attribution 4.0 International License CC BY 4.0 (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.During the last 10 years, hundreds of different defect prediction models have been published. The performance of the classifiers used in these models is reported to be similar with models rarely performing above the predictive performance ceiling of about 80% recall. We investigate the individual defects that four classifiers predict and analyse the level of prediction uncertainty produced by these classifiers. We perform a sensitivity analysis to compare the performance of Random Forest, Naïve Bayes, RPart and SVM classifiers when predicting defects in NASA, open source and commercial datasets. The defect predictions that each classifier makes is captured in a confusion matrix and the prediction uncertainty of each classifier is compared. Despite similar predictive performance values for these four classifiers, each detects different sets of defects. Some classifiers are more consistent in predicting defects than others. Our results confirm that a unique subset of defects can be detected by specific classifiers. However, while some classifiers are consistent in the predictions they make, other classifiers vary in their predictions. Given our results, we conclude that classifier ensembles with decision-making strategies not based on majority voting are likely to perform best in defect prediction.Peer reviewedFinal Published versio
Are Smell-Based Metrics Actually Useful in Effort-Aware Structural Change-Proneness Prediction? An Empirical Study
Bad code smells (also named as code smells) are symptoms of poor design choices in implementation. Existing studies empirically confirmed that the presence of code smells increases the likelihood of subsequent changes (i.e., change-proness). However, to the best of our knowledge, no prior studies have leveraged smell-based metrics to predict particular change type (i.e., structural changes). Moreover, when evaluating the effectiveness of smell-based metrics in structural change-proneness prediction, none of existing studies take into account of the effort inspecting those change-prone source code. In this paper, we consider five smell-based metrics for effort-aware structural change-proneness prediction and compare these metrics with a baseline of well-known CK metrics in predicting particular categories of change types. Specifically, we first employ univariate logistic regression to analyze the correlation between each smellbased metric and structural change-proneness. Then, we build multivariate prediction models to examine the effectiveness of smell-based metrics in effort-aware structural change-proneness prediction when used alone and used together with the baseline metrics, respectively. Our experiments are conducted on six Java open-source projects with up to 60 versions and results indicate that: (1) all smell-based metrics are significantly related to structural change-proneness, except metric ANS in hive and SCM in camel after removing confounding effect of file size; (2) in most cases, smell-based metrics outperform the baseline metrics in predicting structural change-proneness; and (3) when used together with the baseline metrics, the smell-based metrics are more effective to predict change-prone files with being aware of inspection effort
Test Case Selection and Prioritization Using Machine Learning: A Systematic Literature Review
Regression testing is an essential activity to assure that software code
changes do not adversely affect existing functionalities. With the wide
adoption of Continuous Integration (CI) in software projects, which increases
the frequency of running software builds, running all tests can be
time-consuming and resource-intensive. To alleviate that problem, Test case
Selection and Prioritization (TSP) techniques have been proposed to improve
regression testing by selecting and prioritizing test cases in order to provide
early feedback to developers. In recent years, researchers have relied on
Machine Learning (ML) techniques to achieve effective TSP (ML-based TSP). Such
techniques help combine information about test cases, from partial and
imperfect sources, into accurate prediction models. This work conducts a
systematic literature review focused on ML-based TSP techniques, aiming to
perform an in-depth analysis of the state of the art, thus gaining insights
regarding future avenues of research. To that end, we analyze 29 primary
studies published from 2006 to 2020, which have been identified through a
systematic and documented process. This paper addresses five research questions
addressing variations in ML-based TSP techniques and feature sets for training
and testing ML models, alternative metrics used for evaluating the techniques,
the performance of techniques, and the reproducibility of the published
studies
Connecting Software Metrics across Versions to Predict Defects
Accurate software defect prediction could help software practitioners
allocate test resources to defect-prone modules effectively and efficiently. In
the last decades, much effort has been devoted to build accurate defect
prediction models, including developing quality defect predictors and modeling
techniques. However, current widely used defect predictors such as code metrics
and process metrics could not well describe how software modules change over
the project evolution, which we believe is important for defect prediction. In
order to deal with this problem, in this paper, we propose to use the
Historical Version Sequence of Metrics (HVSM) in continuous software versions
as defect predictors. Furthermore, we leverage Recurrent Neural Network (RNN),
a popular modeling technique, to take HVSM as the input to build software
prediction models. The experimental results show that, in most cases, the
proposed HVSM-based RNN model has a significantly better effort-aware ranking
effectiveness than the commonly used baseline models
- …