Search CORE

19,234 research outputs found

Software defect prediction: do different classifiers find the same defects?

Author: AT Mısırlı
B Turhan
C Catal
C Seiffert
C Soares
D Gray
D Gray
David Bowes
DH Wolpert
E Arisholm
H Chen
I Witten
IH Laradji
Jean Petrić
K Elish
L Briand
L Madeyski
M D’Ambros
M Shepperd
M Shepperd
M Shepperd
MA Hall
N Fenton
NV Chawla
R Malhotra
S Lessmann
T Hall
T Khoshgoftaar
T Menzies
Tracy Hall
U Fayyad
W Chen
Y Zhou
Z Sun
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2017
Field of study

Open Access: This article is distributed under the terms of the Creative Commons Attribution 4.0 International License CC BY 4.0 (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.During the last 10 years, hundreds of different defect prediction models have been published. The performance of the classifiers used in these models is reported to be similar with models rarely performing above the predictive performance ceiling of about 80% recall. We investigate the individual defects that four classifiers predict and analyse the level of prediction uncertainty produced by these classifiers. We perform a sensitivity analysis to compare the performance of Random Forest, Naïve Bayes, RPart and SVM classifiers when predicting defects in NASA, open source and commercial datasets. The defect predictions that each classifier makes is captured in a confusion matrix and the prediction uncertainty of each classifier is compared. Despite similar predictive performance values for these four classifiers, each detects different sets of defects. Some classifiers are more consistent in predicting defects than others. Our results confirm that a unique subset of defects can be detected by specific classifiers. However, while some classifiers are consistent in the predictions they make, other classifiers vary in their predictions. Given our results, we conclude that classifier ensembles with decision-making strategies not based on majority voting are likely to perform best in defect prediction.Peer reviewedFinal Published versio

Crossref

Springer - Publisher Connector

Lancaster E-Prints

University of Hertfordshire Research Archive

Data quality: Some comments on the NASA software defect datasets

Author: Mair C
Shepperd M
Song Q
Sun Z
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/09/2013
Field of study

Background-Self-evidently empirical analyses rely upon the quality of their data. Likewise, replications rely upon accurate reporting and using the same rather than similar versions of datasets. In recent years, there has been much interest in using machine learners to classify software modules into defect-prone and not defect-prone categories. The publicly available NASA datasets have been extensively used as part of this research. Objective-This short note investigates the extent to which published analyses based on the NASA defect datasets are meaningful and comparable. Method-We analyze the five studies published in the IEEE Transactions on Software Engineering since 2007 that have utilized these datasets and compare the two versions of the datasets currently in use. Results-We find important differences between the two versions of the datasets, implausible values in one dataset and generally insufficient detail documented on dataset preprocessing. Conclusions-It is recommended that researchers 1) indicate the provenance of the datasets they use, 2) report any preprocessing in sufficient detail to enable meaningful replication, and 3) invest effort in understanding the data prior to applying machine learners

Crossref

UAL Research Online

Brunel University Research Archive

Estimation of Defect proneness Using Design complexity Measurements in Object- Oriented Software

Author: Nair T. R. Gopalakrishnan
Prasad V. Kamakshi
Selvarani R.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2009
Field of study

Software engineering is continuously facing the challenges of growing complexity of software packages and increased level of data on defects and drawbacks from software production process. This makes a clarion call for inventions and methods which can enable a more reusable, reliable, easily maintainable and high quality software systems with deeper control on software generation process. Quality and productivity are indeed the two most important parameters for controlling any industrial process. Implementation of a successful control system requires some means of measurement. Software metrics play an important role in the management aspects of the software development process such as better planning, assessment of improvements, resource allocation and reduction of unpredictability. The process involving early detection of potential problems, productivity evaluation and evaluating external quality factors such as reusability, maintainability, defect proneness and complexity are of utmost importance. Here we discuss the application of CK metrics and estimation model to predict the external quality parameters for optimizing the design process and production process for desired levels of quality. Estimation of defect-proneness in object-oriented system at design level is developed using a novel methodology where models of relationship between CK metrics and defect-proneness index is achieved. A multifunctional estimation approach captures the correlation between CK metrics and defect proneness level of software modules.Comment: 5 pages, 1 figur

arXiv.org e-Print Archive

Crossref

Recommended from our members

A general software defect-proneness prediction framework

Author: Jia Z
Liu J
Shepperd M
Song Q
Ying S
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/05/2011
Field of study

This is the author's accepted manuscript. The final published article is available from the link below. Copyright @ 2011 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other users, including reprinting/ republishing this material for advertising or promotional purposes, creating new collective works for resale or redistribution to servers or lists, or reuse of any copyrighted components of this work in other works.BACKGROUND - Predicting defect-prone software components is an economically important activity and so has received a good deal of attention. However, making sense of the many, and sometimes seemingly inconsistent, results is difficult. OBJECTIVE - We propose and evaluate a general framework for software defect prediction that supports 1) unbiased and 2) comprehensive comparison between competing prediction systems. METHOD - The framework is comprised of 1) scheme evaluation and 2) defect prediction components. The scheme evaluation analyzes the prediction performance of competing learning schemes for given historical data sets. The defect predictor builds models according to the evaluated learning scheme and predicts software defects with new data according to the constructed model. In order to demonstrate the performance of the proposed framework, we use both simulation and publicly available software defect data sets. RESULTS - The results show that we should choose different learning schemes for different data sets (i.e., no scheme dominates), that small details in conducting how evaluations are conducted can completely reverse findings, and last, that our proposed framework is more effective and less prone to bias than previous approaches. CONCLUSIONS - Failure to properly or fully evaluate a learning scheme can be misleading; however, these problems may be overcome by our proposed framework.National Natural Science Foundation of Chin

Brunel University Research Archive

Downwind rotor horizontal axis wind turbine noise prediction

Author: Klatte R. J.
Metzger F. B.
Publication venue
Publication date: 01/05/1981
Field of study

NASA and industry are currently cooperating in the conduct of extensive experimental and analytical studies to understand and predict the noise of large, horizontal axis wind turbines. This effort consists of (1) obtaining high quality noise data under well controlled and documented test conditions, (2) establishing the annoyance criteria for impulse noise of the type generated by horizontal axis wind turbines with rotors downwind of the support tower, (3) defining the wake characteristics downwind of the axial location of the plane of rotation, (4) comparing predictions with measurements made by use of wake data, and (5) comparing predictions with annoyance criteria. The status of work by Hamilton Standard in the above areas which was done in support of the cooperative NASA and industry studies is briefly summarized

NASA Technical Reports Server

Software Defect Association Mining and Defect Correction Effort Prediction

Author: Cartwright MH
Mair C
Shepperd MJ
Song Q
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2006
Field of study

Much current software defect prediction work concentrates on the number of defects remaining in software system. In this paper, we present association rule mining based methods to predict defect associations and defect-correction effort. This is to help developers detect software defects and assist project managers in allocating testing resources more effectively. We applied the proposed methods to the SEL defect data consisting of more than 200 projects over more than 15 years. The results show that for the defect association prediction, the accuracy is very high and the false negative rate is very low. Likewise for the defect-correction effort prediction, the accuracy for both defect isolation effort prediction and defect correction effort prediction are also high. We compared the defect-correction effort prediction method with other types of methods: PART, C4.5, and Na¨ıve Bayes and show that accuracy has been improved by at least 23%. We also evaluated the impact of support and confidence levels on prediction accuracy, false negative rate, false positive rate, and the number of rules. We found that higher support and confidence levels may not result in higher prediction accuracy, and a sufficient number of rules is a precondition for high prediction accuracy

CiteSeerX

Crossref

Brunel University Research Archive

Further thoughts on precision

Author: Bowes David
Christianson B.
Davey N.
Gray D.
Sun Yi
Publication venue: 'Institution of Engineering and Technology (IET)'
Publication date: 01/01/2011
Field of study

Background: There has been much discussion amongst automated software defect prediction researchers regarding use of the precision and false positive rate classifier performance metrics. Aim: To demonstrate and explain why failing to report precision when using data with highly imbalanced class distributions may provide an overly optimistic view of classifier performance. Method: Well documented examples of how dependent class distribution affects the suitability of performance measures. Conclusions: When using data where the minority class represents less than around 5 to 10 percent of data points in total, failing to report precision may be a critical mistake. Furthermore, deriving the precision values omitted from studies can reveal valuable insight into true classifier performancePeer reviewedFinal Accepted Versio

CiteSeerX

Crossref

Lancaster E-Prints

University of Hertfordshire Research Archive

Analysis of pressure distortion testing

Author: Koch K. E.
Rees R. L.
Publication venue
Publication date
Field of study

The development of a distortion methodology, method D, was documented, and its application to steady state and unsteady data was demonstrated. Three methodologies based upon DIDENT, a NASA-LeRC distortion methodology based upon the parallel compressor model, were investigated by applying them to a set of steady state data. The best formulation was then applied to an independent data set. The good correlation achieved with this data set showed that method E, one of the above methodologies, is a viable concept. Unsteady data were analyzed by using the method E methodology. This analysis pointed out that the method E sensitivities are functions of pressure defect level as well as corrected speed and pattern

NASA Technical Reports Server

Software development: A paradigm for the future

Author: Basili Victor R.
Publication venue
Publication date: 01/06/1989
Field of study

A new paradigm for software development that treats software development as an experimental activity is presented. It provides built-in mechanisms for learning how to develop software better and reusing previous experience in the forms of knowledge, processes, and products. It uses models and measures to aid in the tasks of characterization, evaluation and motivation. An organization scheme is proposed for separating the project-specific focus from the organization's learning and reuse focuses of software development. The implications of this approach for corporations, research and education are discussed and some research activities currently underway at the University of Maryland that support this approach are presented

NASA Technical Reports Server

Digital Repository at the University of Maryland