10,775 research outputs found
Software Metrics in Boa Large-Scale Software Mining Infrastructure: Challenges and Solutions
In this paper, we describe our experience implementing some of classic
software engineering metrics using Boa - a large-scale software repository
mining platform - and its dedicated language. We also aim to take an advantage
of the Boa infrastructure to propose new software metrics and to characterize
open source projects by software metrics to provide reference values of
software metrics based on large number of open source projects. Presented
software metrics, well known and proposed in this paper, can be used to build
large-scale software defect prediction models. Additionally, we present the
obstacles we met while developing metrics, and our analysis can be used to
improve Boa in its future releases. The implemented metrics can also be used as
a foundation for more complex explorations of open source projects and serve as
a guide how to implement software metrics using Boa as the source code of the
metrics is freely available to support reproducible research.Comment: Chapter 8 of the book "Software Engineering: Improving Practice
through Research" (B. Hnatkowska and M. \'Smia{\l}ek, eds.), pp. 131-146,
201
Connecting Software Metrics across Versions to Predict Defects
Accurate software defect prediction could help software practitioners
allocate test resources to defect-prone modules effectively and efficiently. In
the last decades, much effort has been devoted to build accurate defect
prediction models, including developing quality defect predictors and modeling
techniques. However, current widely used defect predictors such as code metrics
and process metrics could not well describe how software modules change over
the project evolution, which we believe is important for defect prediction. In
order to deal with this problem, in this paper, we propose to use the
Historical Version Sequence of Metrics (HVSM) in continuous software versions
as defect predictors. Furthermore, we leverage Recurrent Neural Network (RNN),
a popular modeling technique, to take HVSM as the input to build software
prediction models. The experimental results show that, in most cases, the
proposed HVSM-based RNN model has a significantly better effort-aware ranking
effectiveness than the commonly used baseline models
Software defect prediction: do different classifiers find the same defects?
Open Access: This article is distributed under the terms of the Creative Commons Attribution 4.0 International License CC BY 4.0 (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.During the last 10 years, hundreds of different defect prediction models have been published. The performance of the classifiers used in these models is reported to be similar with models rarely performing above the predictive performance ceiling of about 80% recall. We investigate the individual defects that four classifiers predict and analyse the level of prediction uncertainty produced by these classifiers. We perform a sensitivity analysis to compare the performance of Random Forest, Naïve Bayes, RPart and SVM classifiers when predicting defects in NASA, open source and commercial datasets. The defect predictions that each classifier makes is captured in a confusion matrix and the prediction uncertainty of each classifier is compared. Despite similar predictive performance values for these four classifiers, each detects different sets of defects. Some classifiers are more consistent in predicting defects than others. Our results confirm that a unique subset of defects can be detected by specific classifiers. However, while some classifiers are consistent in the predictions they make, other classifiers vary in their predictions. Given our results, we conclude that classifier ensembles with decision-making strategies not based on majority voting are likely to perform best in defect prediction.Peer reviewedFinal Published versio
An Exploratory Framework for Intelligent Labelling of Fault Datasets
Software fault prediction (SFP) has become a pivotal aspect in realm of software quality. Nevertheless, discipline of
software quality suffers the starvation of fault datasets. Most of the research endeavors are focused on type of dataset, its granularity, metrics used and metrics extractors. However, sporadic attention has been exerted on development of fault datasets and their associated challenges. There are very few publicly available datasets limiting the possibilities of comprehensive experiments on way to improvising the quality of software. Current research targets to address the challenges pertinent to fault dataset collection and development if one is not available publicly. It also considers dynamic identification of available resources such as public dataset, open-source software archieves, metrics parsers and intelligent labeling techniques. A framework for dataset collection and development process has been furnished along with evaluation procedure for the identified resources
- …