Search CORE

3 research outputs found

Defectors: A Large, Diverse Python Dataset for Defect Prediction

Author: Mahbub Parvez
Rahman Mohammad Masudur
Shuvo Ohiduzzaman
Publication venue
Publication date: 11/04/2023
Field of study

Defect prediction has been a popular research topic where machine learning (ML) and deep learning (DL) have found numerous applications. However, these ML/DL-based defect prediction models are often limited by the quality and size of their datasets. In this paper, we present Defectors, a large dataset for just-in-time and line-level defect prediction. Defectors consists of

\approx

213K source code files (

\approx

93K defective and

\approx

120K defect-free) that span across 24 popular Python projects. These projects come from 18 different domains, including machine learning, automation, and internet-of-things. Such a scale and diversity make Defectors a suitable dataset for training ML/DL models, especially transformer models that require large and diverse datasets. We also foresee several application areas of our dataset including defect prediction and defect explanation. Dataset link: https://doi.org/10.5281/zenodo.770898

arXiv.org e-Print Archive

Revisiting supervised and unsupervised methods for effort-aware cross-project defect prediction

Author: CHEN Xiang
GU Qing
LO David
NI Chao
XIA Xin
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/06/2020
Field of study

Institutional Knowledge at Singapore Management University