Search CORE

22,733 research outputs found

Is "Better Data" Better than "Better Data Miners"? (On the Benefits of Tuning SMOTE for Defect Prediction)

Author: Brown André EX
Ch'ng Quee-Lim
Currie Michael
Grundy Laura J
Hokanson Jim
Javer Avelino
Kerr Rex
Lee Chee Wai
Li Chris
Li Kezhi
Schafer William R
Yemini Eviatar
Publication venue
Publication date: 20/02/2018
Field of study

We report and fix an important systematic error in prior studies that ranked classifiers for software analytics. Those studies did not (a) assess classifiers on multiple criteria and they did not (b) study how variations in the data affect the results. Hence, this paper applies (a) multi-criteria tests while (b) fixing the weaker regions of the training data (using SMOTUNED, which is a self-tuning version of SMOTE). This approach leads to dramatically large increases in software defect predictions. When applied in a 5*5 cross-validation study for 3,681 JAVA classes (containing over a million lines of code) from open source systems, SMOTUNED increased AUC and recall by 60% and 20% respectively. These improvements are independent of the classifier used to predict for quality. Same kind of pattern (improvement) was observed when a comparative analysis of SMOTE and SMOTUNED was done against the most recent class imbalance technique. In conclusion, for software analytic tasks like defect prediction, (1) data pre-processing can be more important than classifier choice, (2) ranking studies are incomplete without such pre-processing, and (3) SMOTUNED is a promising candidate for pre-processing.Comment: 10 pages + 2 references. Accepted to International Conference of Software Engineering (ICSE), 201

arXiv.org e-Print Archive

ZENODO

FigShare

Is "Better Data" Better than "Better Data Miners"? (On the Benefits of Tuning SMOTE for Defect Prediction)

Author: Bennin Kwabena Ebo
Chiha I.
Ghotra Baljinder
Menzies Tim
Omran M.
Pedregosa Fabian
Refaeilzadeh Payam
Tan Ming
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 20/02/2018
Field of study

arXiv.org e-Print Archive

Crossref

Software evolution prediction using seasonal time analysis: a comparative study

Author: Brito e Abreu Fernando
Fonte Nelson
Goulão Miguel
Wermelinger Michel
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/03/2012
Field of study

Prediction models of software change requests are useful for supporting rational and timely resource allocation to the evolution process. In this paper we use a time series forecasting model to predict software maintenance and evolution requests in an open source software project (Eclipse), as an example of projects with seasonal release cycles. We build an ARIMA model based on data collected from Eclipse’s change request tracking system since the project’s start. A change request may refer to defects found in the software, but also to suggested improvements in the system under scrutiny. Our model includes the identification of seasonal patterns and tendencies, and is validated through the forecast of the change requests evolution for the next 12 months. The usage of seasonal information significantly improves the estimation ability of this model, when compared to other ARIMA models found in the literature, and does so for a much longer estimation period. Being able to accurately forecast the change requests’ evolution over a fairly long time period is an important ability for enabling adequate process control in maintenance activities, and facilitates effort estimation and timely resources allocation. The approach presented in this paper is suitable for projects with a relatively long history, as the model building process relies on historic data

CiteSeerX

Crossref

Open Research Online (The Open University)

Towards Automated Performance Bug Identification in Python

Author: Mazzawi Elie
Miranskyy Andriy
Tsakiltsidis Sokratis
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 28/07/2016
Field of study

Context: Software performance is a critical non-functional requirement, appearing in many fields such as mission critical applications, financial, and real time systems. In this work we focused on early detection of performance bugs; our software under study was a real time system used in the advertisement/marketing domain. Goal: Find a simple and easy to implement solution, predicting performance bugs. Method: We built several models using four machine learning methods, commonly used for defect prediction: C4.5 Decision Trees, Na\"{\i}ve Bayes, Bayesian Networks, and Logistic Regression. Results: Our empirical results show that a C4.5 model, using lines of code changed, file's age and size as explanatory variables, can be used to predict performance bugs (recall=0.73, accuracy=0.85, and precision=0.96). We show that reducing the number of changes delivered on a commit, can decrease the chance of performance bug injection. Conclusions: We believe that our approach can help practitioners to eliminate performance bugs early in the development cycle. Our results are also of interest to theoreticians, establishing a link between functional bugs and (non-functional) performance bugs, and explicitly showing that attributes used for prediction of functional bugs can be used for prediction of performance bugs

arXiv.org e-Print Archive

Crossref