4 research outputs found

    Probabilistic XGBoost Threshold Classification with Autoencoder for Credit Card Fraud Detection

    Get PDF
    Due to the imbalanced data of outnumbered legitimate transactions than the fraudulent transaction, the detection of fraud is a challenging task to find an effective solution. In this study, autoencoder with probabilistic threshold shifting of XGBoost (AE-XGB) for credit card fraud detection is designed. Initially, AE-XGB employs autoencoder the prevalent dimensionality reduction technique to extract data features from latent space representation. Then the reconstructed lower dimensional features utilize eXtreame Gradient Boost (XGBoost), an ensemble boosting algorithm with probabilistic threshold to classify the data as fraudulent or legitimate. In addition to AE-XGB, other existing ensemble algorithms such as Adaptive Boosting (AdaBoost), Gradient Boosting Machine (GBM), Random Forest, Categorical Boosting (CatBoost), LightGBM and XGBoost are compared with optimal and default threshold. To validate the methodology, we used IEEE-CIS fraud detection dataset for our experiment. Class imbalance and high dimensionality characteristics of dataset reduce the performance of model hence the data is preprocessed and trained. To evaluate the performance of the model, evaluation indicators such as precision, recall, f1-score, g-mean and Mathews Correlation Coefficient (MCC) are accomplished. The findings revealed that the performance of the proposed AE-XGB model is effective in handling imbalanced data and able to detect fraudulent transactions with 90.4% of recall and 90.5% of f1-score from incoming new transactions

    A new framework of feature engineering for machine learning in financial fraud detection

    Get PDF
    Financial fraud activities have soared despite the advancement of fraud detection models empowered by machine learning (ML). To address this issue, we propose a new framework of feature engineering for ML models. The framework consists of feature creation that combines feature aggregation and feature transformation, and feature selection that accommodates a variety of ML algorithms. To illustrate the effectiveness of the framework, we conduct an experiment using an actual financial transaction dataset and show that the framework significantly improves the performance of ML fraud detection models. Specifically, all the ML models complemented by a feature set generated from our framework surpass the same models without such a feature set by nearly 40% on the F1-measure and 20% on the Area Under the Curve (AUC) value
    corecore