Balancing the accuracy rates of the majority and minority classes is challenging in imbalanced
classification. Furthermore, data characteristics have a significant impact on the performance
of imbalanced classifiers, which are generally neglected by existing evaluation
methods. The objective of this study is to introduce a new criterion to comprehensively
evaluate imbalanced classifiers. Specifically, we introduce an efficiency curve that is established
using data envelopment analysis without explicit inputs (DEA-WEI), to determine
the trade-off between the benefits of improved minority class accuracy and the cost of
reduced majority class accuracy. In sequence, we analyze the impact of the imbalanced
ratio and typical imbalanced data characteristics on the efficiency of the classifiers.
Empirical analyses using 68 imbalanced data reveal that traditional classifiers such as
C4.5 and the k-nearest neighbor are more effective on disjunct data, whereas ensemble
and undersampling techniques are more effective for overlapping and noisy data. The efficiency
of cost-sensitive classifiers decreases dramatically when the imbalanced ratio
increases. Finally, we investigate the reasons for the different efficiencies of classifiers on
imbalanced data and recommend steps to select appropriate classifiers for imbalanced data
based on data characteristics.National Natural Science Foundation of China (NSFC) 71874023
71725001
71771037
7197104