9,386 research outputs found
Recommended from our members
IMPROVING CREDIT CARD FRAUD DETECTION USING TRANSFER LEARNING AND DATA RESAMPLING TECHNIQUES
This Culminating Experience Project explores the use of machine learning algorithms to detect credit card fraud. The research questions are: Q1. What cross-domain techniques developed in other domains can be effectively adapted and applied to mitigate or eliminate credit card fraud, and how do these techniques compare in terms of fraud detection accuracy and efficiency? Q2. To what extent do synthetic data generation methods effectively mitigate the challenges posed by imbalanced datasets in credit card fraud detection, and how do these methods impact classification performance? Q3. To what extent can the combination of transfer learning and innovative data resampling techniques improve the accuracy and efficiency of credit card fraud detection systems when dealing with imbalanced datasets, and what novel strategies can be developed to address this common challenge?
The main findings are: Q1. Unconventional cross-domain methods improved fraud detection, holding promise for enhanced security. Q2. The problems caused by unbalanced datasets in credit card fraud detection were effectively addressed by the synthetic data generation techniques SMOTE and ADASYN, resulting in a more balanced dataset suitable for fraud classification. Q3. The combination of neural networks and data resampling techniques, such as SMOTE and ADASYN, significantly improved credit card fraud detection accuracy.
The main conclusions are: Q1. Cross-domain methods are useful for credit card fraud detection, especially when it comes to online transactions. Q2. When used with various classifiers, neural networks show remarkable accuracy rates: 97% for unbalanced data, 99.47% for SMOTE, and 99.11% for ADASYN Q3. A fraud recall of 0.99 is obtained by the model evaluation on imbalanced data, with 12,155 right predictions out of 12,336 and 181 incorrect ones. The identified areas for further study encompass the testing of our model on larger datasets and the optimization of hyperparameters for further enhancement
Enhancing credit card fraud detection: an ensemble machine learning approach
In the era of digital advancements, the escalation of credit card fraud necessitates the development of robust and efficient fraud detection systems. This paper delves into the application of machine learning models, specifically focusing on ensemble methods, to enhance credit card fraud detection. Through an extensive review of existing literature, we identified limitations in current fraud detection technologies, including issues like data imbalance, concept drift, false positives/negatives, limited generalisability, and challenges in real-time processing. To address some of these shortcomings, we propose a novel ensemble model that integrates a Support Vector Machine (SVM), K-Nearest Neighbor (KNN), Random Forest (RF), Bagging, and Boosting classifiers. This ensemble model tackles the dataset imbalance problem associated with most credit card datasets by implementing under-sampling and the Synthetic Over-sampling Technique (SMOTE) on some machine learning algorithms. The evaluation of the model utilises a dataset comprising transaction records from European credit card holders, providing a realistic scenario for assessment. The methodology of the proposed model encompasses data pre-processing, feature engineering, model selection, and evaluation, with Google Colab computational capabilities facilitating efficient model training and testing. Comparative analysis between the proposed ensemble model, traditional machine learning methods, and individual classifiers reveals the superior performance of the ensemble in mitigating challenges associated with credit card fraud detection. Across accuracy, precision, recall, and F1-score metrics, the ensemble outperforms existing models. This paper underscores the efficacy of ensemble methods as a valuable tool in the battle against fraudulent transactions. The findings presented lay the groundwork for future advancements in the development of more resilient and adaptive fraud detection systems, which will become crucial as credit card fraud techniques continue to evolve
Enhancing Credit Card Fraud Detection: An Ensemble Machine Learning Approach
In the era of digital advancements, the escalation of credit card fraud necessitates the development of robust and efficient fraud detection systems. This paper delves into the application of machine learning models, specifically focusing on ensemble methods, to enhance credit card fraud detection. Through an extensive review of existing literature, we identified limitations in current fraud detection technologies, including issues like data imbalance, concept drift, false positives/negatives, limited generalisability, and challenges in real-time processing. To address some of these shortcomings, we propose a novel ensemble model that integrates a Support Vector Machine (SVM), K-Nearest Neighbor (KNN), Random Forest (RF), Bagging, and Boosting classifiers. This ensemble model tackles the dataset imbalance problem associated with most credit card datasets by implementing under-sampling and the Synthetic Over-sampling Technique (SMOTE) on some machine learning algorithms. The evaluation of the model utilises a dataset comprising transaction records from European credit card holders, providing a realistic scenario for assessment. The methodology of the proposed model encompasses data pre-processing, feature engineering, model selection, and evaluation, with Google Colab computational capabilities facilitating efficient model training and testing. Comparative analysis between the proposed ensemble model, traditional machine learning methods, and individual classifiers reveals the superior performance of the ensemble in mitigating challenges associated with credit card fraud detection. Across accuracy, precision, recall, and F1-score metrics, the ensemble outperforms existing models. This paper underscores the efficacy of ensemble methods as a valuable tool in the battle against fraudulent transactions. The findings presented lay the groundwork for future advancements in the development of more resilient and adaptive fraud detection systems, which will become crucial as credit card fraud techniques continue to evolve
Data mining for detecting Bitcoin Ponzi schemes
Soon after its introduction in 2009, Bitcoin has been adopted by
cyber-criminals, which rely on its pseudonymity to implement virtually
untraceable scams. One of the typical scams that operate on Bitcoin are the
so-called Ponzi schemes. These are fraudulent investments which repay users
with the funds invested by new users that join the scheme, and implode when it
is no longer possible to find new investments. Despite being illegal in many
countries, Ponzi schemes are now proliferating on Bitcoin, and they keep
alluring new victims, who are plundered of millions of dollars. We apply data
mining techniques to detect Bitcoin addresses related to Ponzi schemes. Our
starting point is a dataset of features of real-world Ponzi schemes, that we
construct by analysing, on the Bitcoin blockchain, the transactions used to
perform the scams. We use this dataset to experiment with various machine
learning algorithms, and we assess their effectiveness through standard
validation protocols and performance metrics. The best of the classifiers we
have experimented can identify most of the Ponzi schemes in the dataset, with a
low number of false positives
Outlier Mining Methods Based on Graph Structure Analysis
Outlier detection in high-dimensional datasets is a fundamental and challenging problem across disciplines that has also practical implications, as removing outliers from the training set improves the performance of machine learning algorithms. While many outlier mining algorithms have been proposed in the literature, they tend to be valid or efficient for specific types of datasets (time series, images, videos, etc.). Here we propose two methods that can be applied to generic datasets, as long as there is a meaningful measure of distance between pairs of elements of the dataset. Both methods start by defining a graph, where the nodes are the elements of the dataset, and the links have associated weights that are the distances between the nodes. Then, the first method assigns an outlier score based on the percolation (i.e., the fragmentation) of the graph. The second method uses the popular IsoMap non-linear dimensionality reduction algorithm, and assigns an outlier score by comparing the geodesic distances with the distances in the reduced space. We test these algorithms on real and synthetic datasets and show that they either outperform, or perform on par with other popular outlier detection methods. A main advantage of the percolation method is that is parameter free and therefore, it does not require any training; on the other hand, the IsoMap method has two integer number parameters, and when they are appropriately selected, the method performs similar to or better than all the other methods tested.Peer ReviewedPostprint (published version
- …