Search CORE

2 research outputs found

Is "Better Data" Better than "Better Data Miners"? (On the Benefits of Tuning SMOTE for Defect Prediction)

Author: Bennin Kwabena Ebo
Chiha I.
Ghotra Baljinder
Menzies Tim
Omran M.
Pedregosa Fabian
Refaeilzadeh Payam
Tan Ming
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 20/02/2018
Field of study

We report and fix an important systematic error in prior studies that ranked classifiers for software analytics. Those studies did not (a) assess classifiers on multiple criteria and they did not (b) study how variations in the data affect the results. Hence, this paper applies (a) multi-criteria tests while (b) fixing the weaker regions of the training data (using SMOTUNED, which is a self-tuning version of SMOTE). This approach leads to dramatically large increases in software defect predictions. When applied in a 5*5 cross-validation study for 3,681 JAVA classes (containing over a million lines of code) from open source systems, SMOTUNED increased AUC and recall by 60% and 20% respectively. These improvements are independent of the classifier used to predict for quality. Same kind of pattern (improvement) was observed when a comparative analysis of SMOTE and SMOTUNED was done against the most recent class imbalance technique. In conclusion, for software analytic tasks like defect prediction, (1) data pre-processing can be more important than classifier choice, (2) ranking studies are incomplete without such pre-processing, and (3) SMOTUNED is a promising candidate for pre-processing.Comment: 10 pages + 2 references. Accepted to International Conference of Software Engineering (ICSE), 201

arXiv.org e-Print Archive

Crossref

Spatial Interpolation Techniques on Participatory Sensing Data

Author: Allemann Dominik
Atkinson Peter M.
Aumond Pierre
Basner Mathias
Bhatt Umang
Clark Charlotte
Directive EU
Dutta Joy
Dutta Joy
Goovaerts P.
Gupta Abhishek
Hager John W.
Hager John W.
Hasenfratz David
Hassan Naufil
Hoffmann Barbara
Kar Devroop
Kempen Elise Van
Kephalopoulos Stylianos
Li J.
Maisonneuve Nicolas
Marques Gonçalo
Reed Patrick
Refaeilzadeh Payam
Salomons Erik
Siu-Ngan Lam Nina
Tamilin Andrei
Wackernagel Hans
Wei Weigang
Zheng Yu
al Dibyendu Banerjee
Śliwińska-Kowalska Mariola
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date
Field of study

Crossref