Search CORE

18 research outputs found

Multiple perspectives HMM-based feature engineering for credit card fraud detection

Author: Caelen Olivier
Calabretto Sylvie
Granitzer Michael
He-Guelton Liyun
Laporte Léa
Lucas Yvan
Portier Pierre-Edouard
Publication venue
Publication date: 08/04/2019
Field of study

Machine learning and data mining techniques have been used extensively in order to detect credit card frauds. However, most studies consider credit card transactions as isolated events and not as a sequence of transactions. In this article, we model a sequence of credit card transactions from three different perspectives, namely (i) does the sequence contain a Fraud? (ii) Is the sequence obtained by fixing the card-holder or the payment terminal? (iii) Is it a sequence of spent amount or of elapsed time between the current and previous transactions? Combinations of the three binary perspectives give eight sets of sequences from the (training) set of transactions. Each one of these sets is modelled with a Hidden Markov Model (HMM). Each HMM associates a likelihood to a transaction given its sequence of previous transactions. These likelihoods are used as additional features in a Random Forest classifier for fraud detection. This multiple perspectives HMM-based approach enables an automatic feature engineering in order to model the sequential properties of the dataset with respect to the classification task. This strategy allows for a 15% increase in the precision-recall AUC compared to the state of the art feature engineering strategy for credit card fraud detection.Comment: Presented as a poster in the conference SAC 2019: 34th ACM/SIGAPP Symposium on Applied Computing in April 201

arXiv.org e-Print Archive

Crossref

HAL

Hal-Diderot

Incremental learning strategies for credit cards fraud detection.

Author: Frédéric Oblé
Gian-Marco Paldino
Gianluca Bontempi
Lebichot Bertrand
Liyun He-Guelton
Wissam Siblini
Publication venue
Publication date: 01/01/2021
Field of study

very second, thousands of credit or debit card transactions are processed in financial institutions. This extensive amount of data and its sequential nature make the problem of fraud detection particularly challenging. Most analytical strategies used in production are still based on batch learning, which is inadequate for two reasons: Models quickly become outdated and require sensitive data storage. The evolving nature of bank fraud enshrines the importance of having up-to-date models, and sensitive data retention makes companies vulnerable to infringements of the European General Data Protection Regulation. For these reasons, evaluating incremental learning strategies is recommended. This paper designs and evaluates incremental learning solutions for real-world fraud detection systems. The aim is to demonstrate the competitiveness of incremental learning over conventional batch approaches and, consequently, improve its accuracy employing ensemble learning, diversity and transfer learning. An experimental analysis is conducted on a full-scale case study including five months of e-commerce transactions and made available by our industry partner, Worldline

Open Repository and Bibliography - Luxembourg

Transfer Learning Strategies for Credit Card Fraud Detection.

Author: Bontempi Gianluca
He-Guelton Liyun
Le Borgne Yann-aël
Lebichot Bertrand
Oblé Frédéric
Verheslt Théo
Publication venue
Publication date: 01/01/2021
Field of study

Credit card fraud jeopardizes the trust of customers in e-commerce transactions. This led in recent years to major advances in the design of automatic Fraud Detection Systems (FDS) able to detect fraudulent transactions with short reaction time and high precision. Nevertheless, the heterogeneous nature of the fraud behavior makes it difficult to tailor existing systems to different contexts (e.g. new payment systems, different countries and/or population segments). Given the high cost (research, prototype development, and implementation in production) of designing data-driven FDSs, it is crucial for transactional companies to define procedures able to adapt existing pipelines to new challenges. From an AI/machine learning perspective, this is known as the problem of transfer learning. This paper discusses the design and implementation of transfer learning approaches for e-commerce credit card fraud detection and their assessment in a real setting. The case study, based on a six-month dataset (more than 200 million e-commerce transactions) provided by the industrial partner, relates to the transfer of detection models developed for a European country to another country. In particular, we present and discuss 15 transfer learning techniques (ranging from naive baselines to state-of-the-art and new approaches), making a critical and quantitative comparison in terms of precision for different transfer scenarios. Our contributions are twofold: (i) we show that the accuracy of many transfer methods is strongly dependent on the number of labeled samples in the target domain and (ii) we propose an ensemble solution to this problem based on self-supervised and semi-supervised domain adaptation classifiers. The thorough experimental assessment shows that this solution is both highly accurate and hardly sensitive to the number of labeled samples

Directory of Open Access Journals

DI-fusion

Open Repository and Bibliography - Luxembourg

Transfer learning for credit card fraud detection : A journey from research to production.

Author: Bontempi Gianluca
Coter Guillaume
Fabry Remy
He-Guelton Liyun
Le Borgne Yann-Aël
Lebichot Bertrand
Oble Frederic
Siblini Wissam
Publication venue
Publication date: 01/01/2021
Field of study

The dark face of digital commerce generalization is the increase of fraud attempts. To prevent any type of attacks, state-of-the-art fraud detection systems are now embedding Machine Learning (ML) modules. The conception of such modules is only communicated at the level of research and papers mostly focus on results for isolated benchmark datasets and metrics. But research is only a part of the journey, preceded by the right formulation of the business problem and collection of data, and followed by a practical integration. In this paper, we give a wider vision of the process, on a case study of transfer learning for fraud detection, from business to research, and back to business

Open Repository and Bibliography - Luxembourg

Transfer learning for credit card fraud detection : A journey from research to production.

Author: Bontempi Gianluca
Coter Guillaume
Fabry Remy
He-Guelton Liyun
Le Borgne Yann-Aël
Lebichot Bertrand
Oble Frederic
Siblini Wissam
Publication venue
Publication date: 01/01/2021
Field of study

arXiv.org e-Print Archive

Open Repository and Bibliography - Luxembourg

Statistical emulation of high resolution SAR wind fields from low-resolution model predictions

Author: Chapron Bertrand
Fablet Ronan
He-Guelton Liyun
Tournadre Jean
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2014
Field of study

International audienceThis paper addresses the reconstruction of high-resolution (HR) sea surface wind fields (typically, at a spatial reso- lution of 1 km). The availability of such HR fields is critical for numerous issues, e.g. coastal management, offshore structures, oil spill disaster tracking, etc. Satellites, especially from Synthetic Aperture Radar (SAR) systems, can monitor the ocean surface at a spatial resolution of a few meters. SAR wind fields are operationally produced with spatial resolutions of less than 1 km [1, 2]. However, satellite SAR systems involve a highly irregular sampling of the ocean surface and, for a given region, SAR wind fields may be delivered with a low temporal resolution, typically every 7-to-10 days for temperate zones. By contrast, model predictions, such as European Center for Medium-range Weather Forecast (ECMWF) wind fields, are typically delivered with a high temporal resolution (e.g. every 3 h), but with a low spatial resolution (∼50 km × 50 km). The question of the combination of numerical model predictions and SAR wind fields naturally arises to deliver HR wind fields at sea surface anywhere and anytime. Here, we state this issue as the statistical learning of transfer functions between low-resolution (LR) model predictions and the associated HR SAR fields. We investigate the extent to which such regression functions can be learnt from a set of co-located HR and LR fields. Both local and non-local schemes as well as linear and non- linear regression methods are considered. As a case-study, we carry out numerical experiments for a coastal area off Norway, which involves complex LR-to-HR situations

Crossref

HAL-Université de Bretagne Occidentale

ArchiMer - Institutional Archive of Ifremer

HAL Descartes

Hal-Diderot

Learning-based Emulation of Sea Surface Wind Fields from Numerical Model Outputs and SAR Data

Author: Chapron Bertrand
Fablet Ronan
HE-GUELTON Liyun
Tournadre Jean
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2015
Field of study

International audienceThe availability of sea surface wind conditions with a high-resolution space-time sampling is a critical issue for a wide range of applications. Currently, no observation systems nor model forecasts provide relevant information with a high sampling rate both in space and time. Synthetic Aperture Radar (SAR) satellite systems deliver high-resolution sea surface fields, with a spatial resolution below 0.01◦, but they are also char- acterized by a large revisit time up 7-to-10 days for temperate zones. Meanwhile, operational model predictions typically involve a high temporal resolution (e.g. every 6 h), but also a low spatial resolution (0.5◦). With a view to leveraging both data sources, we investigate statistical downscaling schemes. In this study, a new model based on a machine learning method, namely Support Vector Regression (SVR), is built to reconstruct high-resolution sea surface wind fields from low-resolution operational model forecasts. The considered case study off Norway demonstrates the relevance of the proposed SVR model. It outperforms state- of-the-art approaches (namely, linear, analog and Empirical Orthogonal Function (EOF) downscaling models) in terms of mean square error. It also realistically reproduces complex space- time variabilities of the observed SAR wind fields. We further discuss the SVR model as a generalization of the popular linear and analog models

Crossref

HAL-Université de Bretagne Occidentale

ArchiMer - Institutional Archive of Ifremer

Efficient top rank optimization with gradient boosting for supervised anomaly detection

Author: Caelen Olivier
Frery Jordan
Habrard Amaury
HE-GUELTON Liyun
Sebban Marc
Publication venue: HAL CCSD
Publication date: 01/09/2017
Field of study

International audienceIn this paper we address the anomaly detection problem in a supervised setting where positive examples might be very sparse. We tackle this task with a learning to rank strategy by optimizing a differentiable smoothed surrogate of the so-called Average Precision (AP). Despite its non-convexity, we show how to use it efficiently in a stochastic gradient boosting framework. We show that using AP is much better to optimize the top rank alerts than the state of the art measures. We demonstrate on anomaly detection tasks that the interest of our method is even reinforced in highly unbalanced scenarios

HAL-UJM