2 research outputs found
Transfer-Learning Oriented Class Imbalance Learning for Cross-Project Defect Prediction
Cross-project defect prediction (CPDP) aims to predict defects of projects
lacking training data by using prediction models trained on historical defect
data from other projects. However, since the distribution differences between
datasets from different projects, it is still a challenge to build high-quality
CPDP models. Unfortunately, class imbalanced nature of software defect datasets
further increases the difficulty. In this paper, we propose a transferlearning
oriented minority over-sampling technique (TOMO) based feature weighting
transfer naive Bayes (FWTNB) approach (TOMOFWTNB) for CPDP by considering both
classimbalance and feature importance problems. Differing from traditional
over-sampling techniques, TOMO not only can balance the data but reduce the
distribution difference. And then FWTNB is used to further increase the
similarity of two distributions. Experiments are performed on 11 public defect
datasets. The experimental results show that (1) TOMO improves the average
G-Measure by 23.7\%41.8\%, and the average MCC by 54.2\%77.8\%. (2)
feature weighting (FW) strategy improves the average G-Measure by 11\%, and the
average MCC by 29.2\%. (3) TOMOFWTNB improves the average G-Measure value by at
least 27.8\%, and the average MCC value by at least 71.5\%, compared with
existing state-of-theart CPDP approaches. It can be concluded that (1) TOMO is
very effective for addressing class-imbalance problem in CPDP scenario; (2) our
FW strategy is helpful for CPDP; (3) TOMOFWTNB outperforms previous
state-of-the-art CPDP approaches
Understanding the Automated Parameter Optimization on Transfer Learning for CPDP: An Empirical Study
Data-driven defect prediction has become increasingly important in software
engineering process. Since it is not uncommon that data from a software project
is insufficient for training a reliable defect prediction model, transfer
learning that borrows data/knowledge from other projects to facilitate the
model building at the current project, namely cross-project defect prediction
(CPDP), is naturally plausible. Most CPDP techniques involve two major steps,
i.e., transfer learning and classification, each of which has at least one
parameter to be tuned to achieve their optimal performance. This practice fits
well with the purpose of automated parameter optimization. However, there is a
lack of thorough understanding about what are the impacts of automated
parameter optimization on various CPDP techniques. In this paper, we present
the first empirical study that looks into such impacts on 62 CPDP techniques,
13 of which are chosen from the existing CPDP literature while the other 49
ones have not been explored before. We build defect prediction models over 20
real-world software projects that are of different scales and characteristics.
Our findings demonstrate that: (1) Automated parameter optimization
substantially improves the defect prediction performance of 77\% CPDP
techniques with a manageable computational cost. Thus more efforts on this
aspect are required in future CPDP studies. (2) Transfer learning is of
ultimate importance in CPDP. Given a tight computational budget, it is more
cost-effective to focus on optimizing the parameter configuration of transfer
learning algorithms (3) The research on CPDP is far from mature where it is
"not difficult" to find a better alternative by making a combination of
existing transfer learning and classification techniques. This finding provides
important insights about the future design of CPDP techniques