Search CORE

8 research outputs found

Incremental learning strategies for credit cards fraud detection.

Author: Frédéric Oblé
Gian-Marco Paldino
Gianluca Bontempi
Lebichot Bertrand
Liyun He-Guelton
Wissam Siblini
Publication venue
Publication date: 01/01/2021
Field of study

very second, thousands of credit or debit card transactions are processed in financial institutions. This extensive amount of data and its sequential nature make the problem of fraud detection particularly challenging. Most analytical strategies used in production are still based on batch learning, which is inadequate for two reasons: Models quickly become outdated and require sensitive data storage. The evolving nature of bank fraud enshrines the importance of having up-to-date models, and sensitive data retention makes companies vulnerable to infringements of the European General Data Protection Regulation. For these reasons, evaluating incremental learning strategies is recommended. This paper designs and evaluates incremental learning solutions for real-world fraud detection systems. The aim is to demonstrate the competitiveness of incremental learning over conventional batch approaches and, consequently, improve its accuracy employing ensemble learning, diversity and transfer learning. An experimental analysis is conducted on a full-scale case study including five months of e-commerce transactions and made available by our industry partner, Worldline

Open Repository and Bibliography - Luxembourg

An Adversary Model of Fraudsters’ Behavior to Improve Oversampling in Credit Card Fraud Detection

Author: Bontempi Gianluca
Caelen Olivier
Lunghi Daniele
Paldino Gian Marco
Publication venue
Publication date: 01/01/2023
Field of study

Imbalanced learning jeopardizes the accuracy of traditional classification models, particularly for what concerns the minority class, which is often the class of interest. This paper addresses the issue of imbalanced learning in credit card fraud detection by introducing a novel approach that models fraudulent behavior as a time-dependent process. The main contribution is the design and assessment of an oversampling strategy, called 'Adversary-based Oversampling' (ADVO), which relies on modeling the temporal relationship among frauds. The strategy is implemented by two learning approaches: first, an innovative regression-based oversampling model that predicts subsequent fraudulent activities based on previous fraud features. Second, the adaptation of the state-of-the-art TimeGAN oversampling algorithm to the context of credit card fraud detection. This adaptation involves treating a sequence of frauds from the same card as a time series, from which artificial frauds' time series are generated. Experiments have been conducted using real credit card transaction data from our industrial partner, Worldline S.A, and a synthetic dataset generated by a transaction simulator for reproducibility purposes. Our findings show that an oversampling approach incorporating time-dependent modeling of frauds provides competitive results, measured against common fraud detection metrics, compared to traditional oversampling algorithms.SCOPUS: ar.jinfo:eu-repo/semantics/publishe

Directory of Open Access Journals

DI-fusion

Does AutoML Outperform Naive Forecasting? †

Author: Bontempi Gianluca
De Caro Fabrizio
De Stefani Jacopo
Paldino Gian Marco
Publication venue
Publication date: 05/07/2021
Field of study

The availability of massive amounts of temporal data opens new perspectives of knowledge extraction and automated decision making for companies and practitioners. However, learning forecasting models from data requires a knowledgeable data science or machine learning (ML) background and expertise, which is not always available to end-users. This gap fosters a growing demand for frameworks automating the ML pipeline and ensuring broader access to the general public. Automatic machine learning (AutoML) provides solutions to build and validate machine learning pipelines minimizing the user intervention. Most of those pipelines have been validated in static supervised learning settings, while an extensive validation in time series prediction is still missing. This issue is particularly important in the forecasting community, where the relevance of machine learning approaches is still under debate. This paper assesses four existing AutoML frameworks (AutoGluon, H2O, TPOT, Auto-sklearn) on a number of forecasting challenges (univariate and multivariate, single-step and multi-step ahead) by benchmarking them against simple and conventional forecasting strategies (e.g. naive and exponential smoothing). The obtained results highlight that AutoML approaches are not yet mature enough to address generic forecasting tasks once compared with faster yet more basic statistical forecasters. In particular, the tested AutoML configurations, on average, do not significantly outperform a Naive estimator. Those results, yet preliminary, should not be interpreted as a rejection of AutoML solutions in forecasting but as an encouragement to a more rigorous validation of their limits and perspectives.SCOPUS: ar.jinfo:eu-repo/semantics/publishe

Crossref

DI-fusion

Assessment of catastrophic forgetting in continual credit card fraud detection

Author: Bontempi Gianluca
Le Borgne Yann-Aël
Lebichot Bertrand
Oblé Frédéric
Paldino Gian Marco
Siblini Wissam
Publication venue
Publication date: 01/01/2024
Field of study

The volume of e-commerce continues to increase year after year. Buying goods on the internet is easy and practical, and took a huge boost during the lockdowns of the Covid crisis. However, this is also an open window for fraudsters and the corresponding financial loss costs billions of dollars. In this paper, we study e-commerce credit card fraud detection, in collaboration with our industrial partner, Worldline. Transactional companies are more and more dependent on machine learning models such as deep learning anomaly detection models, as part of real-world fraud detection systems (FDS). We focus on continual learning to find the best model with respect to two objectives: to maximize the accuracy and to minimize the catastrophic forgetting phenomenon. For the latter, we proposed an evaluation procedure to quantify the forgetting in data streams with delayed feedback: the plasticity/stability visualization matrix. We also investigated six strategies and 13 methods on a real-size case study including five months of e-commerce credit card transactions. Finally, we discuss how the trade-off between plasticity and stability is set, in practice, in the case of FDS.SCOPUS: ar.jinfo:eu-repo/semantics/publishe

DI-fusion

Incremental learning strategies for credit cards fraud detection

Author: Bontempi Gianluca
He-Guelton Liyun
Lebichot Bertrand
Oblé Frédéric
Paldino Gian Marco
Siblini Wissam
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/08/2021
Field of study

Every second, thousands of credit or debit card transactions are processed in financial institutions. This extensive amount of data and its sequential nature make the problem of fraud detection particularly challenging. Most analytical strategies used in production are still based on batch learning, which is inadequate for two reasons: Models quickly become outdated and require sensitive data storage. The evolving nature of bank fraud enshrines the importance of having up-to-date models, and sensitive data retention makes companies vulnerable to infringements of the European General Data Protection Regulation. For these reasons, evaluating incremental learning strategies is recommended. This paper designs and evaluates incremental learning solutions for real-world fraud detection systems. The aim is to demonstrate the competitiveness of incremental learning over conventional batch approaches and, consequently, improve its accuracy employing ensemble learning, diversity and transfer learning. An experimental analysis is conducted on a full-scale case study including five months of e-commerce transactions and made available by our industry partner, Worldline.SCOPUS: ar.jinfo:eu-repo/semantics/publishe

DI-fusion

Probing the randomness of the local current distributions of 316 L stainless steel corrosion in NaCl solution

Author: Bernal Miguel
Bertolucci Coelho Léonardo
Bontempi Gianluca
Paldino Gian Marco
Torres Morillo Daniel
Ustarroz Troyano Jon
Publication venue
Publication date: 10/03/2023
Field of study

info:eu-repo/semantics/publishe

DI-fusion

Estimating pitting descriptors of 316 L stainless steel by machine learning and statistical analysis

Author: Bernal Miguel
Bertolucci Coelho Léonardo
Bontempi Gianluca
Paldino Gian Marco
Torres Morillo Daniel
Ustarroz Troyano Jon
Vangrunderbeek Vincent
Publication venue
Publication date: 01/12/2023
Field of study

Abstract A hybrid rule-based/ML approach using linear regression and artificial neural networks (ANNs) determined pitting corrosion descriptors from high-throughput data obtained with Scanning Electrochemical Cell Microscopy (SECCM) on 316 L stainless steel. Non-parametric density estimation determined the central tendencies of the E pit /log( jpit ) and E pass /log( jpass ) distributions. Descriptors estimated using conditional mean or median curves were compared to their central tendency values, with the conditional medians providing more accurate results. Due to their lower sensitivity to high outliers, the conditional medians were more robust representations of the log( j ) vs. E distributions. An observed trend of passive range shortening with increasing testing aggressiveness was attributed to delayed stabilisation of the passive film, rather than early passivity breakdown.SCOPUS: ar.jinfo:eu-repo/semantics/publishe

DI-fusion

A Digital Twin Approach for Improving Estimation Accuracy in Dynamic Thermal Rating of Transmission Lines

Author: Bontempi Gianluca (author)
De Caro Fabrizio (author)
De Stefani J. (author)
Paldino Gian Marco (author)
Vaccaro Alfredo (author)
Villacci Domenico (author)
Publication venue: 'MDPI AG'
Publication date: 01/01/2022
Field of study

The limitation of transmission lines thermal capacity plays a crucial role in the safety and reliability of power systems. Dynamic thermal line rating approaches aim to estimate the transmission line’s temperature and assess its compliance with the limitations above. Existing physics-based standards estimate the temperature based on environment and line conditions measured by several sensors. This manuscript shows that estimation accuracy can be improved by adopting a data-driven Digital Twin approach. The proposed method exploits machine learning by learning the input–output relation between the physical sensors data and the actual conductor temperature, serving as a digital equivalent to physics-based standards. An experimental assessment on real data, comparing the proposed approach with the IEEE 738 standard, shows a reduction of 60% of the Root Mean Squared Error and a decrease in the maximum estimation error from above 10 °C to below 7 °C. These preliminary results suggest that the Digital Twin provides more accurate and robust estimations, serving as a complement, or a potential alternative, to traditional methods.Information and Communication Technolog

TU Delft Repository

DI-fusion