Search CORE

4,505 research outputs found

Is "Better Data" Better than "Better Data Miners"? (On the Benefits of Tuning SMOTE for Defect Prediction)

Author: Brown André EX
Ch'ng Quee-Lim
Currie Michael
Grundy Laura J
Hokanson Jim
Javer Avelino
Kerr Rex
Lee Chee Wai
Li Chris
Li Kezhi
Schafer William R
Yemini Eviatar
Publication venue
Publication date: 20/02/2018
Field of study

We report and fix an important systematic error in prior studies that ranked classifiers for software analytics. Those studies did not (a) assess classifiers on multiple criteria and they did not (b) study how variations in the data affect the results. Hence, this paper applies (a) multi-criteria tests while (b) fixing the weaker regions of the training data (using SMOTUNED, which is a self-tuning version of SMOTE). This approach leads to dramatically large increases in software defect predictions. When applied in a 5*5 cross-validation study for 3,681 JAVA classes (containing over a million lines of code) from open source systems, SMOTUNED increased AUC and recall by 60% and 20% respectively. These improvements are independent of the classifier used to predict for quality. Same kind of pattern (improvement) was observed when a comparative analysis of SMOTE and SMOTUNED was done against the most recent class imbalance technique. In conclusion, for software analytic tasks like defect prediction, (1) data pre-processing can be more important than classifier choice, (2) ranking studies are incomplete without such pre-processing, and (3) SMOTUNED is a promising candidate for pre-processing.Comment: 10 pages + 2 references. Accepted to International Conference of Software Engineering (ICSE), 201

arXiv.org e-Print Archive

ZENODO

FigShare

Is "Better Data" Better than "Better Data Miners"? (On the Benefits of Tuning SMOTE for Defect Prediction)

Author: Bennin Kwabena Ebo
Chiha I.
Ghotra Baljinder
Menzies Tim
Omran M.
Pedregosa Fabian
Refaeilzadeh Payam
Tan Ming
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 20/02/2018
Field of study

arXiv.org e-Print Archive

Crossref

A Review of Metrics and Modeling Techniques in Software Fault Prediction Model Development

Author: Chandra Pravin
Goyal Rinkaj
Singh Yogesh
Publication venue: Covenant University, Ota, Nigeria
Publication date: 18/07/2016
Field of study

This paper surveys different software fault predictions progressed through different data analytic techniques reported in the software engineering literature. This study split in three broad areas; (a) The description of software metrics suites reported and validated in the literature. (b) A brief outline of previous research published in the development of software fault prediction model based on various analytic techniques. This utilizes the taxonomy of analytic techniques while summarizing published research. (c) A review of the advantages of using the combination of metrics. Though, this area is comparatively new and needs more research efforts

Covenant Journals (Covenant University)

Applying FAHP to Improve the Performance Evaluation Reliability and Validity of Software Defect Classifiers

Author: Ghunaim Hussam
Publication venue
Publication date: 01/10/2019
Field of study

Today’s Software complexity makes developing defect-free software almost impossible. On an average, billions of dollars are lost every year because of software defects in the United States alone, while the global loss is much higher. Consequently, developing classifiers to classify software modules into defective and non-defective before software releases, has attracted a great interest in academia and the software industry alike. Although many classifiers have been proposed, none has been proven superior to others. The major reason is that while a research shows that classifier-A is better than classifier-B, we can find other research coming to a diametrically opposite conclusion. These conflicts are usually triggered when researchers report results using their preferred performance quality measures such as recall and precision. Although this approach is valid, it does not examine all possible facets of classifiers’ performance characteristics. Thus, performance evaluation might improve or deteriorate if researchers choose other performance measures. As a result, software developers usually struggle to select the most suitable classifier to use in their projects. The goal of this dissertation is to apply the Fuzzy Analytical Hierarchy Process (FAHP) as a popular multi-criteria decision-making technique to overcome these inconsistencies in research outcomes. This evaluation framework incorporates a wider spectrum of performance measures to evaluate classifiers’ performance, rather than relying on selected, preferred measures. The results show that this approach will increase software developers’ confidence in research outcomes, help them in avoiding false conclusions and indicate reasonable boundaries for them. We utilized 22 popular performance measures and 11 software defect classifiers. The analysis was carried out using KNIME data mining platform and 12 software defect data sets provided by NASA Metrics Data Program (MDP) repository

UB ScholarWorks