3,448 research outputs found
Defect prediction with bad smells in code
Background: Defect prediction in software can be highly beneficial for
development projects, when prediction is highly effective and defect-prone
areas are predicted correctly. One of the key elements to gain effective
software defect prediction is proper selection of metrics used for dataset
preparation. Objective: The purpose of this research is to verify, whether code
smells metrics, collected using Microsoft CodeAnalysis tool, added to basic
metric set, can improve defect prediction in industrial software development
project. Results: We verified, if dataset extension by the code smells sourced
metrics, change the effectiveness of the defect prediction by comparing
prediction results for datasets with and without code smells-oriented metrics.
In a result, we observed only small improvement of effectiveness of defect
prediction when dataset extended with bad smells metrics was used: average
accuracy value increased by 0.0091 and stayed within the margin of error.
However, when only use of code smells based metrics were used for prediction
(without basic set of metrics), such process resulted with surprisingly high
accuracy (0.8249) and F-measure (0.8286) results. We also elaborated data
anomalies and problems we observed when two different metric sources were used
to prepare one, consistent set of data. Conclusion: Extending the dataset by
the code smells sourced metric does not significantly improve the prediction
effectiveness. Achieved result did not compensate effort needed to collect
additional metrics. However, we observed that defect prediction based on the
code smells only is still highly effective and can be used especially where
other metrics hardly be used.Comment: Chapter 10 in Software Engineering: Improving Practice through
Research (B. Hnatkowska and M. \'Smia{\l}ek, eds.), pp. 163-176, 201
Too Trivial To Test? An Inverse View on Defect Prediction to Identify Methods with Low Fault Risk
Background. Test resources are usually limited and therefore it is often not
possible to completely test an application before a release. To cope with the
problem of scarce resources, development teams can apply defect prediction to
identify fault-prone code regions. However, defect prediction tends to low
precision in cross-project prediction scenarios.
Aims. We take an inverse view on defect prediction and aim to identify
methods that can be deferred when testing because they contain hardly any
faults due to their code being "trivial". We expect that characteristics of
such methods might be project-independent, so that our approach could improve
cross-project predictions.
Method. We compute code metrics and apply association rule mining to create
rules for identifying methods with low fault risk. We conduct an empirical
study to assess our approach with six Java open-source projects containing
precise fault data at the method level.
Results. Our results show that inverse defect prediction can identify approx.
32-44% of the methods of a project to have a low fault risk; on average, they
are about six times less likely to contain a fault than other methods. In
cross-project predictions with larger, more diversified training sets,
identified methods are even eleven times less likely to contain a fault.
Conclusions. Inverse defect prediction supports the efficient allocation of
test resources by identifying methods that can be treated with less priority in
testing activities and is well applicable in cross-project prediction
scenarios.Comment: Submitted to PeerJ C
Connecting Software Metrics across Versions to Predict Defects
Accurate software defect prediction could help software practitioners
allocate test resources to defect-prone modules effectively and efficiently. In
the last decades, much effort has been devoted to build accurate defect
prediction models, including developing quality defect predictors and modeling
techniques. However, current widely used defect predictors such as code metrics
and process metrics could not well describe how software modules change over
the project evolution, which we believe is important for defect prediction. In
order to deal with this problem, in this paper, we propose to use the
Historical Version Sequence of Metrics (HVSM) in continuous software versions
as defect predictors. Furthermore, we leverage Recurrent Neural Network (RNN),
a popular modeling technique, to take HVSM as the input to build software
prediction models. The experimental results show that, in most cases, the
proposed HVSM-based RNN model has a significantly better effort-aware ranking
effectiveness than the commonly used baseline models
Continuous Defect Prediction: The Idea and a Related Dataset
We would like to present the idea of our Continuous Defect Prediction (CDP)
research and a related dataset that we created and share. Our dataset is
currently a set of more than 11 million data rows, representing files involved
in Continuous Integration (CI) builds, that synthesize the results of CI builds
with data we mine from software repositories. Our dataset embraces 1265
software projects, 30,022 distinct commit authors and several software process
metrics that in earlier research appeared to be useful in software defect
prediction. In this particular dataset we use TravisTorrent as the source of CI
data. TravisTorrent synthesizes commit level information from the Travis CI
server and GitHub open-source projects repositories. We extend this data to a
file change level and calculate the software process metrics that may be used,
for example, as features to predict risky software changes that could break the
build if committed to a repository with CI enabled.Comment: Lech Madeyski and Marcin Kawalerowicz. "Continuous Defect Prediction:
The Idea and a Related Dataset" In: 14th International Conference on Mining
Software Repositories (MSR'17). Buenos Aires. 2017, pp. 515-518. doi:
10.1109/MSR.2017.46. URL:
http://madeyski.e-informatyka.pl/download/MadeyskiKawalerowiczMSR17.pd
Deep Learning for Software Defect Prediction: An LSTM-based Approach
Software defect prediction is an important aspect of software development, as it helps developers and organizations to identify and resolve bugs in the software before they become major issues. In this paper, we explore the use of machine learning algorithms for software defect prediction. We discuss the different types of machine learning algorithms that have been used for software defect prediction and their advantages and disadvantages. We also provide a comprehensive review of recent studies that have used machine learning algorithms for software defect prediction. The paper concludes with a discussion of the challenges and opportunities in using machine learning algorithms for software defect prediction and the future directions of research in this field. This paper surveys the existing literature on software defect prediction, focusing specifically on deep learning techniques. Compared to existing surveys on the topic, this paper offers a more in-depth analysis of the strengths and weaknesses of deep learning approaches for software defect prediction. It explores the use of LSTMs for this task, which have not been extensively studied in previous surveys. Additionally, this paper provides a comprehensive review of recent research in the field, highlighting the most promising deep learning models and techniques for software defect prediction. The results of this survey demonstrate that LSTM-based deep learning models can outperform traditional machine learning approaches and achieve state-of-the-art results in software defect prediction. Furthermore, this paper provides insights into the challenges and limitations of deep learning approaches for software defect prediction, highlighting areas for future research and improvement. Overall, this paper offers a valuable resource for researchers and practitioners interested in using deep learning techniques for software defect prediction.
- …