Search CORE

759 research outputs found

A Comprehensive Survey of Data Mining-based Fraud Detection Research

Author: Agrawal
Au
Berry
Brentnall
Chen
Chiang Wang
David C. Yen
Feelders
Han
Hayhoe
Kirkosa
Ku
Leonard
Mitchell
Ngai
Quah
Rothman
Shaw
Shing-Han Li
Song
Sudjianto
Titus
Wen-Hui Lu
White
Publication venue: 'Elsevier BV'
Publication date: 30/09/2010
Field of study

This survey paper categorises, compares, and summarises from almost all published technical and review articles in automated fraud detection within the last 10 years. It defines the professional fraudster, formalises the main types and subtypes of known fraud, and presents the nature of data evidence collected within affected industries. Within the business context of mining the data to achieve higher cost savings, this research presents methods and techniques together with their problems. Compared to all related reviews on fraud detection, this survey covers much more technical articles and is the only one, to the best of our knowledge, which proposes alternative data and solutions from related domains.Comment: 14 page

arXiv.org e-Print Archive

Crossref

One-Class Classification: Taxonomy of Study and Review of Techniques

Author: Khan Shehroz S.
Madden Michael G.
Publication venue: 'Cambridge University Press (CUP)'
Publication date: 29/11/2013
Field of study

One-class classification (OCC) algorithms aim to build classification models when the negative class is either absent, poorly sampled or not well defined. This unique situation constrains the learning of efficient classifiers by defining class boundary just with the knowledge of positive class. The OCC problem has been considered and applied under many research themes, such as outlier/novelty detection and concept learning. In this paper we present a unified view of the general problem of OCC by presenting a taxonomy of study for OCC problems, which is based on the availability of training data, algorithms used and the application domains applied. We further delve into each of the categories of the proposed taxonomy and present a comprehensive literature review of the OCC algorithms, techniques and methodologies with a focus on their significance, limitations and applications. We conclude our paper by discussing some open research problems in the field of OCC and present our vision for future research.Comment: 24 pages + 11 pages of references, 8 figure

arXiv.org e-Print Archive

Access to Research at National University of Ireland, Galway

Explainable Artificial Intelligence and Causal Inference based ATM Fraud Detection

Author: Mane Abhay Anand
Naidu Laveti Ramesh
Ravi Vadlamani
Vivek Yelleti
Publication venue
Publication date: 19/11/2022
Field of study

Gaining the trust of customers and providing them empathy are very critical in the financial domain. Frequent occurrence of fraudulent activities affects these two factors. Hence, financial organizations and banks must take utmost care to mitigate them. Among them, ATM fraudulent transaction is a common problem faced by banks. There following are the critical challenges involved in fraud datasets: the dataset is highly imbalanced, the fraud pattern is changing, etc. Owing to the rarity of fraudulent activities, Fraud detection can be formulated as either a binary classification problem or One class classification (OCC). In this study, we handled these techniques on an ATM transactions dataset collected from India. In binary classification, we investigated the effectiveness of various over-sampling techniques, such as the Synthetic Minority Oversampling Technique (SMOTE) and its variants, Generative Adversarial Networks (GAN), to achieve oversampling. Further, we employed various machine learning techniques viz., Naive Bayes (NB), Logistic Regression (LR), Support Vector Machine (SVM), Decision Tree (DT), Random Forest (RF), Gradient Boosting Tree (GBT), Multi-layer perceptron (MLP). GBT outperformed the rest of the models by achieving 0.963 AUC, and DT stands second with 0.958 AUC. DT is the winner if the complexity and interpretability aspects are considered. Among all the oversampling approaches, SMOTE and its variants were observed to perform better. In OCC, IForest attained 0.959 CR, and OCSVM secured second place with 0.947 CR. Further, we incorporated explainable artificial intelligence (XAI) and causal inference (CI) in the fraud detection framework and studied it through various analyses.Comment: 34 pages; 21 Figures; 8 Table

arXiv.org e-Print Archive

Predicting fraud in mobile money transfer

Author: Adedoyin Adeyinka
Publication venue
Publication date: 01/06/2018
Field of study

University of Brighton Research Portal

Differential evolution technique on weighted voting stacking ensemble method for credit card fraud detection

Author: Dolo Kgaugelo Moses
Publication venue
Publication date: 01/12/2019
Field of study

Differential Evolution is an optimization technique of stochastic search for a population-based vector, which is powerful and efficient over a continuous space for solving differentiable and non-linear optimization problems. Weighted voting stacking ensemble method is an important technique that combines various classifier models. However, selecting the appropriate weights of classifier models for the correct classification of transactions is a problem. This research study is therefore aimed at exploring whether the Differential Evolution optimization method is a good approach for defining the weighting function. Manual and random selection of weights for voting credit card transactions has previously been carried out. However, a large number of fraudulent transactions were not detected by the classifier models. Which means that a technique to overcome the weaknesses of the classifier models is required. Thus, the problem of selecting the appropriate weights was viewed as the problem of weights optimization in this study. The dataset was downloaded from the Kaggle competition data repository. Various machine learning algorithms were used to weight vote a class of transaction. The differential evolution optimization techniques was used as a weighting function. In addition, the Synthetic Minority Oversampling Technique (SMOTE) and Safe Level Synthetic Minority Oversampling Technique (SL-SMOTE) oversampling algorithms were modified to preserve the definition of SMOTE while improving the performance. Result generated from this research study showed that the Differential Evolution Optimization method is a good weighting function, which can be adopted as a systematic weight function for weight voting stacking ensemble method of various classification methods.School of ComputingM. Sc. (Computing

Unisa Institutional Repository

Credit Card Fraud Detection Using Machine Learning Algorithms

Author: Anđelić Nikola
Lorencin Ivan
Mavrinac Marko
Vale Deni
Publication venue: University of Istria
Publication date: 01/01/2023
Field of study

One of the main challenges to the security of an online business is credit card fraud. For this reason, algorithms based on artificial intelligence and machine learning are being introduced to enable the most accurate and fast detection of card fraud. This paper presents an approach to the detection of card fraud based on machine learning algorithms more specifically, a multilayer perceptron (MLP) and a Decision tree. The aforementioned algorithms were trained and tested using a publicly available data set on card fraud. The data set used consists of 7 characteristics of the card transaction and information on whether there was card fraud or not. In total, the data set contains information on 1,000,000 transactions, and it is highly imbalanced. To handle the class imbalance, random undersampling, SMOTE, and SMOTE-Tomek algorithms were proposed. From the achieved results it can be seen that the highest performances are achieved if MLP (AUC = 0.99, f1 = 0.99, MCC = 0.98, and Kappa = 0.98) and Decision tree (AUC = 0.99, f1 = 0.99, MCC = 0.99, and Kappa = 0.98) are trained by using data set re-sampled by using SMOTE-Tomek algorithm. If the performance of the mentioned algorithms is examined using fewer characteristics of the transaction, it can be seen that by reducing the number of characteristics a significant decrease in classification performances can be noticed if a Decision tree in combination with SMOTE-Tomek is used. However, if an MLP in combination with SMOTE-Tomek is used, a significantly lower decrease in performance can be observed, pointing to the higher robustness to input vector dimension reduction. Such a robust system can provide information about transaction validity even in a condition where the input data is limited to a few input variables. From the achieved results, it can be concluded that MLP in combination with the SMOTE-Tomek algorithm can be used for credit card fraud detection, even in conditions with a lower number of input variables

HRČAK - Portal of Croatian Scientific and Professional Journals

A comparative analysis of classifiers in cancer prediction using multiple data mining techniques

Author: Alidoostan A.
Ghaffary K. A.
Jalali S. M.
Mahmoudi M. R.
Maleki M.
Moro S.
Publication venue: 'Inderscience Publishers'
Publication date: 01/01/2017
Field of study

In recent years, application of data mining methods in health industry has received increased attention from both health professionals and scholars. This paper presents a data mining framework for detecting breast cancer based on real data from one of Iran hospitals by applying association rules and the most commonly used classifiers. The former were adopted for reducing the size of datasets, while the latter were chosen for cancer prediction. A k-fold cross validation procedure was included for evaluating the performance of the proposed classifiers. Among the six classifiers used in this paper, support vector machine achieved the best results, with an accuracy of 93%. It is worth mentioning that the approach proposed can be applied for detecting other diseases as well

Repositório Institucional do ISCTE-IUL

Credit Card Fraud Detection Using Machine Learning Algorithms

Author: Anđelić Nikola
Lorencin Ivan
Mavrinac Marko
Vale Deni
Publication venue: University of Istria
Publication date: 01/01/2023
Field of study

Hrčak - Portal of scientific journals of Croatia

An enhanced resampling technique for imbalanced data sets

Author: Maisarah Zorkeflee
Publication venue
Publication date: 01/01/2015
Field of study

A data set is considered imbalanced if the distribution of instances in one class (majority class) outnumbers the other class (minority class). The main problem related to binary imbalanced data sets is classifiers tend to ignore the minority class. Numerous resampling techniques such as undersampling, oversampling, and a combination of both techniques have been widely used. However, the undersampling and oversampling techniques suffer from elimination and addition of relevant data which may lead to poor classification results. Hence, this study aims to increase classification metrics by enhancing the undersampling technique and combining it with an existing oversampling technique. To achieve this objective, a Fuzzy Distancebased Undersampling (FDUS) is proposed. Entropy estimation is used to produce fuzzy thresholds to categorise the instances in majority and minority class into membership functions. FDUS is then combined with the Synthetic Minority Oversampling TEchnique (SMOTE) known as FDUS+SMOTE, which is executed in sequence until a balanced data set is achieved. FDUS and FDUS+SMOTE are compared with four techniques based on classification accuracy, F-measure and Gmean. From the results, FDUS achieved better classification accuracy, F-measure and G-mean, compared to the other techniques with an average of 80.57%, 0.85 and 0.78, respectively. This showed that fuzzy logic when incorporated with Distance-based Undersampling technique was able to reduce the elimination of relevant data. Further, the findings showed that FDUS+SMOTE performed better than combination of SMOTE and Tomek Links, and SMOTE and Edited Nearest Neighbour on benchmark data sets. FDUS+SMOTE has minimised the removal of relevant data from the majority class and avoid overfitting. On average, FDUS and FDUS+SMOTE were able to balance categorical, integer and real data sets and enhanced the performance of binary classification. Furthermore, the techniques performed well on small record size data sets that have of instances in the range of approximately 100 to 800

Universiti Utara Malaysia: UUM eTheses