115,336 research outputs found
Software Defect Association Mining and Defect Correction Effort Prediction
Much current software defect prediction work concentrates on the number of defects remaining in software system. In this paper, we present association rule mining based methods to predict defect associations and defect-correction effort. This is to help developers detect software defects and assist project managers in allocating testing resources more effectively. We applied the proposed methods to the SEL defect data consisting of more than 200 projects over more than 15 years. The results show that for the defect association prediction, the accuracy is very high and the false negative rate is very low. Likewise for the defect-correction effort prediction, the accuracy for both defect isolation effort prediction and defect correction effort prediction are also high. We compared the defect-correction effort prediction method with other types of methods: PART, C4.5, and Na¨ıve Bayes and show that accuracy has been improved by at least 23%. We also evaluated the impact of support and confidence levels on prediction accuracy, false negative rate, false positive rate, and the number of rules. We found that higher support and confidence levels may not result in higher prediction accuracy, and a sufficient number of rules is a precondition for high prediction accuracy
Connecting Software Metrics across Versions to Predict Defects
Accurate software defect prediction could help software practitioners
allocate test resources to defect-prone modules effectively and efficiently. In
the last decades, much effort has been devoted to build accurate defect
prediction models, including developing quality defect predictors and modeling
techniques. However, current widely used defect predictors such as code metrics
and process metrics could not well describe how software modules change over
the project evolution, which we believe is important for defect prediction. In
order to deal with this problem, in this paper, we propose to use the
Historical Version Sequence of Metrics (HVSM) in continuous software versions
as defect predictors. Furthermore, we leverage Recurrent Neural Network (RNN),
a popular modeling technique, to take HVSM as the input to build software
prediction models. The experimental results show that, in most cases, the
proposed HVSM-based RNN model has a significantly better effort-aware ranking
effectiveness than the commonly used baseline models
Continuous Defect Prediction: The Idea and a Related Dataset
We would like to present the idea of our Continuous Defect Prediction (CDP)
research and a related dataset that we created and share. Our dataset is
currently a set of more than 11 million data rows, representing files involved
in Continuous Integration (CI) builds, that synthesize the results of CI builds
with data we mine from software repositories. Our dataset embraces 1265
software projects, 30,022 distinct commit authors and several software process
metrics that in earlier research appeared to be useful in software defect
prediction. In this particular dataset we use TravisTorrent as the source of CI
data. TravisTorrent synthesizes commit level information from the Travis CI
server and GitHub open-source projects repositories. We extend this data to a
file change level and calculate the software process metrics that may be used,
for example, as features to predict risky software changes that could break the
build if committed to a repository with CI enabled.Comment: Lech Madeyski and Marcin Kawalerowicz. "Continuous Defect Prediction:
The Idea and a Related Dataset" In: 14th International Conference on Mining
Software Repositories (MSR'17). Buenos Aires. 2017, pp. 515-518. doi:
10.1109/MSR.2017.46. URL:
http://madeyski.e-informatyka.pl/download/MadeyskiKawalerowiczMSR17.pd
Defect prediction with bad smells in code
Background: Defect prediction in software can be highly beneficial for
development projects, when prediction is highly effective and defect-prone
areas are predicted correctly. One of the key elements to gain effective
software defect prediction is proper selection of metrics used for dataset
preparation. Objective: The purpose of this research is to verify, whether code
smells metrics, collected using Microsoft CodeAnalysis tool, added to basic
metric set, can improve defect prediction in industrial software development
project. Results: We verified, if dataset extension by the code smells sourced
metrics, change the effectiveness of the defect prediction by comparing
prediction results for datasets with and without code smells-oriented metrics.
In a result, we observed only small improvement of effectiveness of defect
prediction when dataset extended with bad smells metrics was used: average
accuracy value increased by 0.0091 and stayed within the margin of error.
However, when only use of code smells based metrics were used for prediction
(without basic set of metrics), such process resulted with surprisingly high
accuracy (0.8249) and F-measure (0.8286) results. We also elaborated data
anomalies and problems we observed when two different metric sources were used
to prepare one, consistent set of data. Conclusion: Extending the dataset by
the code smells sourced metric does not significantly improve the prediction
effectiveness. Achieved result did not compensate effort needed to collect
additional metrics. However, we observed that defect prediction based on the
code smells only is still highly effective and can be used especially where
other metrics hardly be used.Comment: Chapter 10 in Software Engineering: Improving Practice through
Research (B. Hnatkowska and M. \'Smia{\l}ek, eds.), pp. 163-176, 201
A general software defect-proneness prediction framework
This is the author's accepted manuscript. The final published article is available from the link below. Copyright @ 2011 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other users, including reprinting/ republishing this material for advertising or promotional purposes, creating new collective works for resale or redistribution to servers or lists, or reuse of any copyrighted components of this work in other works.BACKGROUND - Predicting defect-prone software components is an economically important activity and so has received a good deal of attention. However, making sense of the many, and sometimes seemingly inconsistent, results is difficult. OBJECTIVE - We propose and evaluate a general framework for software defect prediction that supports 1) unbiased and 2) comprehensive comparison between competing prediction systems. METHOD - The framework is comprised of 1) scheme evaluation and 2) defect prediction components. The scheme evaluation analyzes the prediction performance of competing learning schemes for given historical data sets. The defect predictor builds models according to the evaluated learning scheme and predicts software defects with new data according to the constructed model. In order to demonstrate the performance of the proposed framework, we use both simulation and publicly available software defect data sets. RESULTS - The results show that we should choose different learning schemes for different data sets (i.e., no scheme dominates), that small details in conducting how evaluations are conducted can completely reverse findings, and last, that our proposed framework is more effective and less prone to bias than previous approaches. CONCLUSIONS - Failure to properly or fully evaluate a learning scheme can be misleading; however, these problems may be overcome by our proposed framework.National Natural Science Foundation of
Chin
- …
