4,530 research outputs found
Too Trivial To Test? An Inverse View on Defect Prediction to Identify Methods with Low Fault Risk
Background. Test resources are usually limited and therefore it is often not
possible to completely test an application before a release. To cope with the
problem of scarce resources, development teams can apply defect prediction to
identify fault-prone code regions. However, defect prediction tends to low
precision in cross-project prediction scenarios.
Aims. We take an inverse view on defect prediction and aim to identify
methods that can be deferred when testing because they contain hardly any
faults due to their code being "trivial". We expect that characteristics of
such methods might be project-independent, so that our approach could improve
cross-project predictions.
Method. We compute code metrics and apply association rule mining to create
rules for identifying methods with low fault risk. We conduct an empirical
study to assess our approach with six Java open-source projects containing
precise fault data at the method level.
Results. Our results show that inverse defect prediction can identify approx.
32-44% of the methods of a project to have a low fault risk; on average, they
are about six times less likely to contain a fault than other methods. In
cross-project predictions with larger, more diversified training sets,
identified methods are even eleven times less likely to contain a fault.
Conclusions. Inverse defect prediction supports the efficient allocation of
test resources by identifying methods that can be treated with less priority in
testing activities and is well applicable in cross-project prediction
scenarios.Comment: Submitted to PeerJ C
Towards Automated Performance Bug Identification in Python
Context: Software performance is a critical non-functional requirement,
appearing in many fields such as mission critical applications, financial, and
real time systems. In this work we focused on early detection of performance
bugs; our software under study was a real time system used in the
advertisement/marketing domain.
Goal: Find a simple and easy to implement solution, predicting performance
bugs.
Method: We built several models using four machine learning methods, commonly
used for defect prediction: C4.5 Decision Trees, Na\"{\i}ve Bayes, Bayesian
Networks, and Logistic Regression.
Results: Our empirical results show that a C4.5 model, using lines of code
changed, file's age and size as explanatory variables, can be used to predict
performance bugs (recall=0.73, accuracy=0.85, and precision=0.96). We show that
reducing the number of changes delivered on a commit, can decrease the chance
of performance bug injection.
Conclusions: We believe that our approach can help practitioners to eliminate
performance bugs early in the development cycle. Our results are also of
interest to theoreticians, establishing a link between functional bugs and
(non-functional) performance bugs, and explicitly showing that attributes used
for prediction of functional bugs can be used for prediction of performance
bugs
Statistical Analysis for Revealing Defects in Software Projects: Systematic Literature Review
Mahmoud, A. N., & Santos, V. (2021). Statistical Analysis for Revealing Defects in Software Projects: Systematic Literature Review. International Journal of Advanced Computer Science and Applications, 12(11), 237-249. https://doi.org/10.14569/IJACSA.2021.0121128Defect detection in software is the procedure to identify parts of software that may comprise defects. Software companies always seek to improve the performance of software projects in terms of quality and efficiency. They also seek to deliver the soft-ware projects without any defects to the communities and just in time. The early revelation of defects in software projects is also tried to avoid failure of those projects, save costs, team effort, and time. Therefore, these companies need to build an intelligent model capable of detecting software defects accurately and efficiently. The paper is organized as follows. Section 2 presents the materials and methods, PRISMA, search questions, and search strategy. Section 3 presents the results with an analysis, and discussion, visualizing analysis and analysis per topic. Section 4 presents the methodology. Finally, in Section 5, the conclusion is discussed. The search string was applied to all electronic repositories looking for papers published between 2015 and 2021, which resulted in 627 publications. The results focused on finding three important points by linking the results of manuscript analysis and linking them to the results of the bibliometric analysis. First, the results showed that the number of defects and the number of lines of code are among the most important factors used in revealing software defects. Second, neural networks and regression analysis are among the most important smart and statistical methods used for this purpose. Finally, the accuracy metric and the error rate are among the most important metrics used in comparisons between the efficiency of statistical and intelligent models.publishersversionpublishe
Statistical Analysis for Revealing Defects in Software Projects
Dissertation presented as the partial requirement for obtaining a Master's degree in Information Management, specialization in Information Systems and Technologies ManagementDefect detection in software is the procedure to identify parts of software that may comprise
defects. Software companies always seek to improve the performance of software projects in terms
of quality and efficiency. They also seek to deliver the soft-ware projects without any defects to the
communities and just in time. The early revelation of defects in software projects is also tried to
avoid failure of those projects, save costs, team effort, and time. Therefore, these companies need to
build an intelligent model capable of detecting software defects accurately and efficiently.
This study seeks to achieve two main objectives. The first goal is to build a statistical model to
identify the critical defect factors that influence software projects. The second objective is to build a
statistical model to reveal defects early in software pro-jects as reasonable accurately. A bibliometric
map (VOSviewer) was used to find the relationships between the common terms in those domains.
The results of this study are divided into three parts:
In the first part The term "software engineering" is connected to "cluster," "regression," and "neural
network." Moreover, the terms "random forest" and "feature selection" are connected to "neural
network," "recall," and "software engineering," "cluster," "regression," and "fault prediction model"
and "software defect prediction" and "defect density."
In the second part We have checked and analyzed 29 manuscripts in detail, summarized their major
contributions, and identified a few research gaps.
In the third part Finally, software companies try to find the critical factors that affect the detection of
software defects and find any of the intelligent or statistical methods that help to build a model
capable of detecting those defects with high accuracy.
Two statistical models (Multiple linear regression (MLR) and logistic regression (LR)) were used to
find the critical factors and through them to detect software defects accurately. MLR is executed by
using two methods which are critical defect factors (CDF) and premier list of software defect factors
(PLSDF). The accuracy of MLR-CDF and MLR-PLSDF is 82.3 and 79.9 respectively. The standard error
of MLR-CDF and MLR-PLSDF is 26% and 28% respectively. In addition, LR is executed by using two
methods which are CDF and PLSDF. The accuracy of LR-CDF and LR-PLSDF is 86.4 and 83.8
respectively. The standard error of LR-CDF and LR-PLSDF is 22% and 25% respectively. Therefore, LRCDF
outperforms on all the proposed models and state-of-the-art methods in terms of accuracy and
standard error
- …