Probabilistic XGBoost Threshold Classification with Autoencoder for Credit Card Fraud Detection

Abstract

Due to the imbalanced data of outnumbered legitimate transactions than the fraudulent transaction, the detection of fraud is a challenging task to find an effective solution. In this study, autoencoder with probabilistic threshold shifting of XGBoost (AE-XGB) for credit card fraud detection is designed. Initially, AE-XGB employs autoencoder the prevalent dimensionality reduction technique to extract data features from latent space representation. Then the reconstructed lower dimensional features utilize eXtreame Gradient Boost (XGBoost), an ensemble boosting algorithm with probabilistic threshold to classify the data as fraudulent or legitimate. In addition to AE-XGB, other existing ensemble algorithms such as Adaptive Boosting (AdaBoost), Gradient Boosting Machine (GBM), Random Forest, Categorical Boosting (CatBoost), LightGBM and XGBoost are compared with optimal and default threshold. To validate the methodology, we used IEEE-CIS fraud detection dataset for our experiment. Class imbalance and high dimensionality characteristics of dataset reduce the performance of model hence the data is preprocessed and trained. To evaluate the performance of the model, evaluation indicators such as precision, recall, f1-score, g-mean and Mathews Correlation Coefficient (MCC) are accomplished. The findings revealed that the performance of the proposed AE-XGB model is effective in handling imbalanced data and able to detect fraudulent transactions with 90.4% of recall and 90.5% of f1-score from incoming new transactions

    Similar works