This work stems from three observations on prior Just-In-Time Software Defect
Prediction (JIT-SDP) models. First, prior studies treat the JIT-SDP problem
solely as a classification problem. Second, prior JIT-SDP studies do not
consider that class balancing processing may change the underlying
characteristics of software changeset data. Third, only a single source of
concept drift, the class imbalance evolution is addressed in prior JIT-SDP
incremental learning models.
We propose an incremental learning framework called CPI-JIT for JIT-SDP.
First, in addition to a classification modeling component, the framework
includes a time-series forecast modeling component in order to learn temporal
interdependent relationship in the changesets. Second, the framework features a
purposefully designed over-sampling balancing technique based on SMOTE and
Principal Curves called SMOTE-PC. SMOTE-PC preserves the underlying
distribution of software changeset data.
In this framework, we propose an incremental deep neural network model called
DeepICP. Via an evaluation using \numprojs software projects, we show that: 1)
SMOTE-PC improves the model's predictive performance; 2) to some software
projects it can be beneficial for defect prediction to harness temporal
interdependent relationship of software changesets; and 3) principal curves
summarize the underlying distribution of changeset data and reveals a new
source of concept drift that the DeepICP model is proposed to adapt to