complexFuzzy: A novel clustering method for selecting training instances of cross-project defect prediction

Abstract

Over the last decade, researchers have investigated to what extent cross-project defect prediction (CPDP) shows advantages over traditional defect prediction settings. These works do not take training and testing data of defect prediction from the same project. Instead, dissimilar projects are employed. Selecting proper training data plays an important role in terms of the success of CPDP. In this study, a novel clustering method named complexFuzzy is presented for selecting training data of CPDP. The method is developed by determining membership values with the help of some metrics which can be considered as indicators of complexity. First, CPDP combinations are created on 29 different data sets. Subsequently, complexFuzzy is evaluated by considering cluster centers of data sets and comparing some performance measures including area under the curve (AUC) and F-measure. The method is superior to other five comparison algorithms in terms of the distance of cluster centers and prediction performance

    Similar works