226,533 research outputs found
Too Trivial To Test? An Inverse View on Defect Prediction to Identify Methods with Low Fault Risk
Background. Test resources are usually limited and therefore it is often not
possible to completely test an application before a release. To cope with the
problem of scarce resources, development teams can apply defect prediction to
identify fault-prone code regions. However, defect prediction tends to low
precision in cross-project prediction scenarios.
Aims. We take an inverse view on defect prediction and aim to identify
methods that can be deferred when testing because they contain hardly any
faults due to their code being "trivial". We expect that characteristics of
such methods might be project-independent, so that our approach could improve
cross-project predictions.
Method. We compute code metrics and apply association rule mining to create
rules for identifying methods with low fault risk. We conduct an empirical
study to assess our approach with six Java open-source projects containing
precise fault data at the method level.
Results. Our results show that inverse defect prediction can identify approx.
32-44% of the methods of a project to have a low fault risk; on average, they
are about six times less likely to contain a fault than other methods. In
cross-project predictions with larger, more diversified training sets,
identified methods are even eleven times less likely to contain a fault.
Conclusions. Inverse defect prediction supports the efficient allocation of
test resources by identifying methods that can be treated with less priority in
testing activities and is well applicable in cross-project prediction
scenarios.Comment: Submitted to PeerJ C
Continuous Defect Prediction: The Idea and a Related Dataset
We would like to present the idea of our Continuous Defect Prediction (CDP)
research and a related dataset that we created and share. Our dataset is
currently a set of more than 11 million data rows, representing files involved
in Continuous Integration (CI) builds, that synthesize the results of CI builds
with data we mine from software repositories. Our dataset embraces 1265
software projects, 30,022 distinct commit authors and several software process
metrics that in earlier research appeared to be useful in software defect
prediction. In this particular dataset we use TravisTorrent as the source of CI
data. TravisTorrent synthesizes commit level information from the Travis CI
server and GitHub open-source projects repositories. We extend this data to a
file change level and calculate the software process metrics that may be used,
for example, as features to predict risky software changes that could break the
build if committed to a repository with CI enabled.Comment: Lech Madeyski and Marcin Kawalerowicz. "Continuous Defect Prediction:
The Idea and a Related Dataset" In: 14th International Conference on Mining
Software Repositories (MSR'17). Buenos Aires. 2017, pp. 515-518. doi:
10.1109/MSR.2017.46. URL:
http://madeyski.e-informatyka.pl/download/MadeyskiKawalerowiczMSR17.pd
- …