Search CORE

1,635 research outputs found

Ensemble Learning for fraud detection in Online Payment System: Fraud Detection in Online Payment System

Author: Kothapalli. Mandakini et al.
Publication venue: Auricle Global Society of Education and Research
Publication date: 02/11/2023
Field of study

The imbalanced problem in fraud detection systems refers to the unequal distribution of fraud cases and non-fraud cases in the information that is used to train machine learning models. This can make it difficult to accurately detect fraudulent activity. As a general rule, instances of fraud occur much less frequently than instances of other types of occurrences, which results in a dataset which is very unbalanced. This imbalance can present challenges for machine learning algorithms, as they may become biased towards the majority class (that is, non-fraud cases) and fail to accurately detect fraud. In situations like these, machine learning models may have a high accuracy overall, but a low recall for the minority class (i.e., fraud cases), which means that many instances of fraud will be misclassified as instances of something else and will not be found. In this study, Synthetic Minority Sampling Technique (SMOTE) is used for balancing the data set and the following machine learning algorithms such as decision trees, Enhanced logistic regression, Naive Bayes are used to classify the dataset.Majority Voting mechanism is used to ensemble the DT,NB, ELR methods and analyze the performance of the model. The performance of the Ensemble of various Machine Learning algorithms was superior to that of the other algorithms in terms of accuracy (98.62%), F1 score (95.21%), precision (98.02%), and recall (96.75%)

International Journal on Recent and Innovation Trends in Computing and Communication

Application of Machine Learning Techniques in Credit Card Fraud Detection

Author: Shakya Ronish
Publication venue: Digital Scholarship@UNLV
Publication date: 15/12/2018
Field of study

Credit card fraud is an ever-growing problem in today’s financial market. There has been a rapid increase in the rate of fraudulent activities in recent years causing a substantial financial loss to many organizations, companies, and government agencies. The numbers are expected to increase in the future, because of which, many researchers in this field have focused on detecting fraudulent behaviors early using advanced machine learning techniques. However, the credit card fraud detection is not a straightforward task mainly because of two reasons: (i) the fraudulent behaviors usually differ for each attempt and (ii) the dataset is highly imbalanced, i.e., the frequency of majority samples (genuine cases) outnumbers the minority samples (fraudulent cases). When providing input data of a highly unbalanced class distribution to the predictive model, the model tends to be biased towards the majority samples. As a result, it tends to misrepresent a fraudulent transaction as a genuine transaction. To tackle this problem, data-level approach, where different resampling methods such as undersampling, oversampling, and hybrid strategies, have been implemented along with an algorithmic approach where ensemble models such as bagging and boosting have been applied to a highly skewed dataset containing 284807 transactions. Out of these transactions, only 492 transactions are labeled as fraudulent. Predictive models such as logistic regression, random forest, and XGBoost in combination with different resampling techniques have been applied to predict if a transaction is fraudulent or genuine. The performance of the model is evaluated based on recall, precision, f1-score, precision-recall (PR) curve, and receiver operating characteristics (ROC) curve. The experimental results showed that random forest in combination with a hybrid resampling approach of Synthetic Minority Over-sampling Technique (SMOTE) and Tomek Links removal performed better than other models

University of Nevada, Las Vegas Repository

New Hybrid Data Preprocessing Technique for Highly Imbalanced Dataset

Author: Chew XinYing
Khaw Khai Wah
Malik Esraa Faisal
Publication venue: Institute of Informatics, Slovak Academy of Sciences
Publication date: 09/11/2022
Field of study

One of the most challenging problems in the real-world dataset is the rising numbers of imbalanced data. The fact that the ratio of the majorities is higher than the minorities will lead to misleading results as conventional machine learning algorithms were designed on the assumption of equal class distribution. The purpose of this study is to build a hybrid data preprocessing approach to deal with the class imbalance issue by applying resampling approaches and CSL for fraud detection using a real-world dataset. The proposed hybrid approach consists of two steps in which the first step is to compare several resampling approaches to find the optimum technique with the highest performance in the validation set. While the second method used CSL with optimal weight ratio on the resampled data from the first step. The hybrid technique was found to have a positive impact of 0.987, 0.974, 0.847, 0.853 F2-measure for RF, DT, XGBOOST and LGBM, respectively. Additionally, relative to the conventional methods, it obtained the highest performance for prediction

Computing and Informatics (E-Journal - Institute of Informatics, SAS, Bratislava)

Development of Deep Learning based Intelligent Approach for Credit Card Fraud Detection

Author: Julaiha AG. Noorul
Kannan K. Gokul
Krishnan V. Gokula
Prakash T. A. Mohana
Saradhi M. V. Vijaya
Publication venue: 'Auricle Technologies, Pvt., Ltd.'
Publication date: 31/12/2022
Field of study

Credit card fraud (CCF) has long been a major concern of institutions of financial groups and business partners, and it is also a global interest to researchers due to its growing popularity. In order to predict and detect the CCF, machine learning (ML) has proven to be one of the most promising techniques. But, class inequality is one of the main and recurring challenges when dealing with CCF tasks that hinder model performance. To overcome this challenges, a Deep Learning (DL) techniques are used by the researchers. In this research work, an efficient CCF detection (CCFD) system is developed by proposing a hybrid model called Convolutional Neural Network with Recurrent Neural Network (CNN-RNN). In this model, CNN acts as feature extraction for extracting the valuable information of CCF data and long-term dependency features are studied by RNN model. An imbalance problem is solved by Synthetic Minority Over Sampling Technique (SMOTE) technique. An experiment is conducted on European Dataset to validate the performance of CNN-RNN model with existing CNN and RNN model in terms of major parameters. The results proved that CNN-RNN model achieved 95.83% of precision, where CNN achieved 93.63% of precision and RNN achieved 88.50% of precision

International Journal on Recent and Innovation Trends in Computing and Communication

A numeric-based machine learning design for detecting organized retail fraud in digital marketplaces

Author: Bacao Fernando
Mutemi Abed
Publication venue
Publication date: 02/08/2023
Field of study

Mutemi, A., & Bacao, F. (2023). A numeric-based machine learning design for detecting organized retail fraud in digital marketplaces. Scientific Reports, 13(1), 1-16. [12499]. https://doi.org/10.1038/s41598-023-38304-5Organized retail crime (ORC) is a significant issue for retailers, marketplace platforms, and consumers. Its prevalence and influence have increased fast in lockstep with the expansion of online commerce, digital devices, and communication platforms. Today, it is a costly affair, wreaking havoc on enterprises’ overall revenues and continually jeopardizing community security. These negative consequences are set to rocket to unprecedented heights as more people and devices connect to the Internet. Detecting and responding to these terrible acts as early as possible is critical for protecting consumers and businesses while also keeping an eye on rising patterns and fraud. The issue of detecting fraud in general has been studied widely, especially in financial services, but studies focusing on organized retail crimes are extremely rare in literature. To contribute to the knowledge base in this area, we present a scalable machine learning strategy for detecting and isolating ORC listings on a prominent marketplace platform by merchants committing organized retail crimes or fraud. We employ a supervised learning approach to classify postings as fraudulent or real based on past data from buyer and seller behaviors and transactions on the platform. The proposed framework combines bespoke data preprocessing procedures, feature selection methods, and state-of-the-art class asymmetry resolution techniques to search for aligned classification algorithms capable of discriminating between fraudulent and legitimate listings in this context. Our best detection model obtains a recall score of 0.97 on the holdout set and 0.94 on the out-of-sample testing data set. We achieve these results based on a select set of 45 features out of 58.publishersversionpublishe

Repositório da Universidade Nova de Lisboa

Predicting fraud in mobile money transfer

Author: Adedoyin Adeyinka
Publication venue
Publication date: 01/06/2018
Field of study

University of Brighton Research Portal

Comparative Analysis of Different Distributions Dataset by Using Data Mining Techniques on Credit Card Fraud Detection

Author: Layth Hazim
Oğuz Ata
Publication venue: 'Mechanical Engineering Faculty in Slavonski Brod'
Publication date: 01/01/2020
Field of study

Banks suffer multimillion-dollars losses each year for several reasons, the most important of which is due to credit card fraud. The issue is how to cope with the challenges we face with this kind of fraud. Skewed "class imbalance" is a very important challenge that faces this kind of fraud. Therefore, in this study, we explore four data mining techniques, namely naïve Bayesian (NB),Support Vector Machine (SVM), K-Nearest Neighbor (KNN) and Random Forest (RF), on actual credit card transactions from European cardholders. This paper offers four major contributions. First, we used under-sampling to balance the dataset because of the high imbalance class, implying skewed distribution. Second, we applied NB, SVM, KNN, and RF to under-sampled class to classify the transactions into fraudulent and genuine followed by testing the performance measures using a confusion matrix and comparing them. Third, we adopted cross-validation (CV) with 10 folds to test the accuracy of the four models with a standard deviation followed by comparing the results for all our models. Next, we examined these models against the entire dataset (skewed) using the confusion matrix and AUC (Area Under the ROC Curve) ranking measure to conclude the final results to determine which would be the best model for us to use with a particular type of fraud. The results showing the best accuracy for the NB, SVM, KNN and RF classifiers are 97,80%; 97,46%; 98,16% and 98,23%, respectively. The comparative results have been done by using four-division datasets (75:25), (90:10), (66:34) and (80:20) displayed that the RF performs better than NB, SVM, and KNN, and the results when utilizing our proposed models on the entire dataset (skewed), achieved preferable outcomes to the under-sampled dataset

HRČAK - Portal of Croatian Scientific and Professional Journals

Hrčak - Portal of scientific journals of Croatia

An in-depth performance analysis of the oversampling techniques for high-class imbalanced dataset

Author: Fatichah Chastine
Wibowo Prasetyo
Publication venue: 'Universitas Pesantren Tinggi Darul Ulum (Unipdu)'
Publication date: 28/02/2021
Field of study

Class imbalance occurs when the distribution of classes between the majority and the minority classes is not the same. The data on imbalanced classes may vary from mild to severe. The effect of high-class imbalance may affect the overall classification accuracy since the model is most likely to predict most of the data that fall within the majority class. Such a model will give biased results, and the performance predictions for the minority class often have no impact on the model. The use of the oversampling technique is one way to deal with high-class imbalance, but only a few are used to solve data imbalance. This study aims for an in-depth performance analysis of the oversampling techniques to address the high-class imbalance problem. The addition of the oversampling technique will balance each class’s data to provide unbiased evaluation results in modeling. We compared the performance of Random Oversampling (ROS), ADASYN, SMOTE, and Borderline-SMOTE techniques. All oversampling techniques will be combined with machine learning methods such as Random Forest, Logistic Regression, and k-Nearest Neighbor (KNN). The test results show that Random Forest with Borderline-SMOTE gives the best value with an accuracy value of 0.9997, 0.9474 precision, 0.8571 recall, 0.9000 F1-score, 0.9388 ROC-AUC, and 0.8581 PRAUC of the overall oversampling technique

Jurnal Online Unipdu Jombang (Universitas Pesantren Tinggi Darul 'Ulum)

Explainable Artificial Intelligence and Causal Inference based ATM Fraud Detection

Author: Mane Abhay Anand
Naidu Laveti Ramesh
Ravi Vadlamani
Vivek Yelleti
Publication venue
Publication date: 19/11/2022
Field of study

Gaining the trust of customers and providing them empathy are very critical in the financial domain. Frequent occurrence of fraudulent activities affects these two factors. Hence, financial organizations and banks must take utmost care to mitigate them. Among them, ATM fraudulent transaction is a common problem faced by banks. There following are the critical challenges involved in fraud datasets: the dataset is highly imbalanced, the fraud pattern is changing, etc. Owing to the rarity of fraudulent activities, Fraud detection can be formulated as either a binary classification problem or One class classification (OCC). In this study, we handled these techniques on an ATM transactions dataset collected from India. In binary classification, we investigated the effectiveness of various over-sampling techniques, such as the Synthetic Minority Oversampling Technique (SMOTE) and its variants, Generative Adversarial Networks (GAN), to achieve oversampling. Further, we employed various machine learning techniques viz., Naive Bayes (NB), Logistic Regression (LR), Support Vector Machine (SVM), Decision Tree (DT), Random Forest (RF), Gradient Boosting Tree (GBT), Multi-layer perceptron (MLP). GBT outperformed the rest of the models by achieving 0.963 AUC, and DT stands second with 0.958 AUC. DT is the winner if the complexity and interpretability aspects are considered. Among all the oversampling approaches, SMOTE and its variants were observed to perform better. In OCC, IForest attained 0.959 CR, and OCSVM secured second place with 0.947 CR. Further, we incorporated explainable artificial intelligence (XAI) and causal inference (CI) in the fraud detection framework and studied it through various analyses.Comment: 34 pages; 21 Figures; 8 Table

arXiv.org e-Print Archive

A rule-based machine learning model for financial fraud detection

Author: Haque Md. Mokammel
Islam Saiful
Rezaul Karim Abu Naser Mohammad
Publication venue: Institute of Advanced Engineering and Science
Publication date: 01/02/2024
Field of study

Financial fraud is a growing problem that poses a significant threat to the banking industry, the government sector, and the public. In response, financial institutions must continuously improve their fraud detection systems. Although preventative and security precautions are implemented to reduce financial fraud, criminals are constantly adapting and devising new ways to evade fraud prevention systems. The classification of transactions as legitimate or fraudulent poses a significant challenge for existing classification models due to highly imbalanced datasets. This research aims to develop rules to detect fraud transactions that do not involve any resampling technique. The effectiveness of the rule-based model (RBM) is assessed using a variety of metrics such as accuracy, specificity, precision, recall, confusion matrix, Matthew’s correlation coefficient (MCC), and receiver operating characteristic (ROC) values. The proposed rule-based model is compared to several existing machine learning models such as random forest (RF), decision tree (DT), multi-layer perceptron (MLP), k-nearest neighbor (KNN), naive Bayes (NB), and logistic regression (LR) using two benchmark datasets. The results of the experiment show that the proposed rule-based model beat the other methods, reaching accuracy and precision of 0.99 and 0.99, respectively

Institute of Advanced Engineering and Science