2 research outputs found
AdaCC: Cumulative Cost-Sensitive Boosting for Imbalanced Classification
Class imbalance poses a major challenge for machine learning as most
supervised learning models might exhibit bias towards the majority class and
under-perform in the minority class. Cost-sensitive learning tackles this
problem by treating the classes differently, formulated typically via a
user-defined fixed misclassification cost matrix provided as input to the
learner. Such parameter tuning is a challenging task that requires domain
knowledge and moreover, wrong adjustments might lead to overall predictive
performance deterioration. In this work, we propose a novel cost-sensitive
boosting approach for imbalanced data that dynamically adjusts the
misclassification costs over the boosting rounds in response to model's
performance instead of using a fixed misclassification cost matrix. Our method,
called AdaCC, is parameter-free as it relies on the cumulative behavior of the
boosting model in order to adjust the misclassification costs for the next
boosting round and comes with theoretical guarantees regarding the training
error. Experiments on 27 real-world datasets from different domains with high
class imbalance demonstrate the superiority of our method over 12
state-of-the-art cost-sensitive boosting approaches exhibiting consistent
improvements in different measures, for instance, in the range of [0.3%-28.56%]
for AUC, [3.4%-21.4%] for balanced accuracy, [4.8%-45%] for gmean and
[7.4%-85.5%] for recall.Comment: 30 page