4 research outputs found
Streaming Active Learning Strategies for Real-Life Credit Card Fraud Detection: Assessment and Visualization
Credit card fraud detection is a very challenging problem because of the
specific nature of transaction data and the labeling process. The transaction
data is peculiar because they are obtained in a streaming fashion, they are
strongly imbalanced and prone to non-stationarity. The labeling is the outcome
of an active learning process, as every day human investigators contact only a
small number of cardholders (associated to the riskiest transactions) and
obtain the class (fraud or genuine) of the related transactions. An adequate
selection of the set of cardholders is therefore crucial for an efficient fraud
detection process. In this paper, we present a number of active learning
strategies and we investigate their fraud detection accuracies. We compare
different criteria (supervised, semi-supervised and unsupervised) to query
unlabeled transactions. Finally, we highlight the existence of an
exploitation/exploration trade-off for active learning in the context of fraud
detection, which has so far been overlooked in the literature
PWIDB: A framework for learning to classify imbalanced data streams with incremental data re-balancing technique
The performance of classification algorithms with highly imbalanced streaming data depends upon efficient balancing strategy. Some techniques of balancing strategy have been applied using static batch data to resolve the class imbalance problem, which is difficult if applied for massive data streams. In this paper, a new Piece-Wise Incremental Data re-Balancing (PWIDB) framework is proposed. The PWIDB framework combines automated balancing techniques using Racing Algorithm (RA) and incremental rebalancing technique. RA is an active learning approach capable of classifying imbalanced data and can provide a way to select an appropriate re-balancing technique with imbalanced data. In this paper, we have extended the capability of RA for handling imbalanced data streams in the proposed PWIDB framework. The PWIDB accumulates previous knowledge with increments of re-balanced data and captures the concept of the imbalanced instances. The PWIDB is an incremental streaming batch framework, which is suitable for learning with streaming imbalanced data. We compared the performance of PWIDB with a well-known FLORA technique. Experimental results show that the PWIDB framework exhibits an improved and stable performance compared to FLORA and accumulative re-balancing techniques