Search CORE

15,052 research outputs found

Learning Effective Changes for Software Projects

Author: Krishna Rahul
Publication venue
Publication date: 12/11/2017
Field of study

The primary motivation of much of software analytics is decision making. How to make these decisions? Should one make decisions based on lessons that arise from within a particular project? Or should one generate these decisions from across multiple projects? This work is an attempt to answer these questions. Our work was motivated by a realization that much of the current generation software analytics tools focus primarily on prediction. Indeed prediction is a useful task, but it is usually followed by "planning" about what actions need to be taken. This research seeks to address the planning task by seeking methods that support actionable analytics that offer clear guidance on what to do. Specifically, we propose XTREE and BELLTREE algorithms for generating a set of actionable plans within and across projects. Each of these plans, if followed will improve the quality of the software project.Comment: 4 pages, 2 figures. This a submission for ASE 2017 Doctoral Symposiu

arXiv.org e-Print Archive

Crossref

Software defect prediction: do different classifiers find the same defects?

Author: AT Mısırlı
B Turhan
C Catal
C Seiffert
C Soares
D Gray
D Gray
David Bowes
DH Wolpert
E Arisholm
H Chen
I Witten
IH Laradji
Jean Petrić
K Elish
L Briand
L Madeyski
M D’Ambros
M Shepperd
M Shepperd
M Shepperd
MA Hall
N Fenton
NV Chawla
R Malhotra
S Lessmann
T Hall
T Khoshgoftaar
T Menzies
Tracy Hall
U Fayyad
W Chen
Y Zhou
Z Sun
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2017
Field of study

Open Access: This article is distributed under the terms of the Creative Commons Attribution 4.0 International License CC BY 4.0 (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.During the last 10 years, hundreds of different defect prediction models have been published. The performance of the classifiers used in these models is reported to be similar with models rarely performing above the predictive performance ceiling of about 80% recall. We investigate the individual defects that four classifiers predict and analyse the level of prediction uncertainty produced by these classifiers. We perform a sensitivity analysis to compare the performance of Random Forest, Naïve Bayes, RPart and SVM classifiers when predicting defects in NASA, open source and commercial datasets. The defect predictions that each classifier makes is captured in a confusion matrix and the prediction uncertainty of each classifier is compared. Despite similar predictive performance values for these four classifiers, each detects different sets of defects. Some classifiers are more consistent in predicting defects than others. Our results confirm that a unique subset of defects can be detected by specific classifiers. However, while some classifiers are consistent in the predictions they make, other classifiers vary in their predictions. Given our results, we conclude that classifier ensembles with decision-making strategies not based on majority voting are likely to perform best in defect prediction.Peer reviewedFinal Published versio

Crossref

Springer - Publisher Connector

Lancaster E-Prints

University of Hertfordshire Research Archive

Evaluating prediction systems in software project estimation

Author: MacDonell S
Shepperd M
Publication venue: 'Elsevier BV'
Publication date: 14/06/2012
Field of study

This is the Pre-print version of the Article - Copyright @ 2012 ElsevierContext: Software engineering has a problem in that when we empirically evaluate competing prediction systems we obtain conflicting results. Objective: To reduce the inconsistency amongst validation study results and provide a more formal foundation to interpret results with a particular focus on continuous prediction systems. Method: A new framework is proposed for evaluating competing prediction systems based upon (1) an unbiased statistic, Standardised Accuracy, (2) testing the result likelihood relative to the baseline technique of random ‘predictions’, that is guessing, and (3) calculation of effect sizes. Results: Previously published empirical evaluations of prediction systems are re-examined and the original conclusions shown to be unsafe. Additionally, even the strongest results are shown to have no more than a medium effect size relative to random guessing. Conclusions: Biased accuracy statistics such as MMRE are deprecated. By contrast this new empirical validation framework leads to meaningful results. Such steps will assist in performing future meta-analyses and in providing more robust and usable recommendations to practitioners.Martin Shepperd was supported by the UK Engineering and Physical Sciences Research Council (EPSRC) under Grant EP/H050329

Crossref

AUT Scholarly Commons

Brunel University Research Archive

Statistical Analysis for Revealing Defects in Software Projects: Systematic Literature Review

Author: Mahmoud Alia Nabil
Santos Vítor
Publication venue: 'The Science and Information Organization'
Publication date: 30/11/2021
Field of study

Mahmoud, A. N., & Santos, V. (2021). Statistical Analysis for Revealing Defects in Software Projects: Systematic Literature Review. International Journal of Advanced Computer Science and Applications, 12(11), 237-249. https://doi.org/10.14569/IJACSA.2021.0121128Defect detection in software is the procedure to identify parts of software that may comprise defects. Software companies always seek to improve the performance of software projects in terms of quality and efficiency. They also seek to deliver the soft-ware projects without any defects to the communities and just in time. The early revelation of defects in software projects is also tried to avoid failure of those projects, save costs, team effort, and time. Therefore, these companies need to build an intelligent model capable of detecting software defects accurately and efficiently. The paper is organized as follows. Section 2 presents the materials and methods, PRISMA, search questions, and search strategy. Section 3 presents the results with an analysis, and discussion, visualizing analysis and analysis per topic. Section 4 presents the methodology. Finally, in Section 5, the conclusion is discussed. The search string was applied to all electronic repositories looking for papers published between 2015 and 2021, which resulted in 627 publications. The results focused on finding three important points by linking the results of manuscript analysis and linking them to the results of the bibliometric analysis. First, the results showed that the number of defects and the number of lines of code are among the most important factors used in revealing software defects. Second, neural networks and regression analysis are among the most important smart and statistical methods used for this purpose. Finally, the accuracy metric and the error rate are among the most important metrics used in comparisons between the efficiency of statistical and intelligent models.publishersversionpublishe

Repositório da Universidade Nova de Lisboa

Statistical Analysis for Revealing Defects in Software Projects

Author: Elsayed Alia Nabil Mahmoud Faried
Publication venue
Publication date: 17/01/2022
Field of study

Dissertation presented as the partial requirement for obtaining a Master's degree in Information Management, specialization in Information Systems and Technologies ManagementDefect detection in software is the procedure to identify parts of software that may comprise defects. Software companies always seek to improve the performance of software projects in terms of quality and efficiency. They also seek to deliver the soft-ware projects without any defects to the communities and just in time. The early revelation of defects in software projects is also tried to avoid failure of those projects, save costs, team effort, and time. Therefore, these companies need to build an intelligent model capable of detecting software defects accurately and efficiently. This study seeks to achieve two main objectives. The first goal is to build a statistical model to identify the critical defect factors that influence software projects. The second objective is to build a statistical model to reveal defects early in software pro-jects as reasonable accurately. A bibliometric map (VOSviewer) was used to find the relationships between the common terms in those domains. The results of this study are divided into three parts: In the first part The term "software engineering" is connected to "cluster," "regression," and "neural network." Moreover, the terms "random forest" and "feature selection" are connected to "neural network," "recall," and "software engineering," "cluster," "regression," and "fault prediction model" and "software defect prediction" and "defect density." In the second part We have checked and analyzed 29 manuscripts in detail, summarized their major contributions, and identified a few research gaps. In the third part Finally, software companies try to find the critical factors that affect the detection of software defects and find any of the intelligent or statistical methods that help to build a model capable of detecting those defects with high accuracy. Two statistical models (Multiple linear regression (MLR) and logistic regression (LR)) were used to find the critical factors and through them to detect software defects accurately. MLR is executed by using two methods which are critical defect factors (CDF) and premier list of software defect factors (PLSDF). The accuracy of MLR-CDF and MLR-PLSDF is 82.3 and 79.9 respectively. The standard error of MLR-CDF and MLR-PLSDF is 26% and 28% respectively. In addition, LR is executed by using two methods which are CDF and PLSDF. The accuracy of LR-CDF and LR-PLSDF is 86.4 and 83.8 respectively. The standard error of LR-CDF and LR-PLSDF is 22% and 25% respectively. Therefore, LRCDF outperforms on all the proposed models and state-of-the-art methods in terms of accuracy and standard error

Repositório da Universidade Nova de Lisboa