34 research outputs found
A Bag-of-Paths Node Criticality Measure
This work compares several node (and network) criticality measures
quantifying to which extend each node is critical with respect to the
communication flow between nodes of the network, and introduces a new measure
based on the Bag-of-Paths (BoP) framework. Network disconnection simulation
experiments show that the new BoP measure outperforms all the other measures on
a sample of Erdos-Renyi and Albert-Barabasi graphs. Furthermore, a faster
(still O(n^3)), approximate, BoP criticality relying on the Sherman-Morrison
rank-one update of a matrix is introduced for tackling larger networks. This
approximate measure shows similar performances as the original, exact, one
Two betweenness centrality measures based on Randomized Shortest Paths
This paper introduces two new closely related betweenness centrality measures
based on the Randomized Shortest Paths (RSP) framework, which fill a gap
between traditional network centrality measures based on shortest paths and
more recent methods considering random walks or current flows. The framework
defines Boltzmann probability distributions over paths of the network which
focus on the shortest paths, but also take into account longer paths depending
on an inverse temperature parameter. RSP's have previously proven to be useful
in defining distance measures on networks. In this work we study their utility
in quantifying the importance of the nodes of a network. The proposed RSP
betweenness centralities combine, in an optimal way, the ideas of using the
shortest and purely random paths for analysing the roles of network nodes,
avoiding issues involving these two paradigms. We present the derivations of
these measures and how they can be computed in an efficient way. In addition,
we show with real world examples the potential of the RSP betweenness
centralities in identifying interesting nodes of a network that more
traditional methods might fail to notice.Comment: Minor updates; published in Scientific Report
Incremental learning strategies for credit cards fraud detection.
very second, thousands of credit or debit card transactions are processed in financial institutions. This extensive amount of data and its sequential nature make the problem of fraud detection particularly challenging. Most analytical strategies used in production are still based on batch learning, which is inadequate for two reasons: Models quickly become outdated and require sensitive data storage. The evolving nature of bank fraud enshrines the importance of having up-to-date models, and sensitive data retention makes companies vulnerable to infringements of the European General Data Protection Regulation. For these reasons, evaluating incremental learning strategies is recommended. This paper designs and evaluates incremental learning solutions for real-world fraud detection systems. The aim is to demonstrate the competitiveness of incremental learning over conventional batch approaches and, consequently, improve its accuracy employing ensemble learning, diversity and transfer learning. An experimental analysis is conducted on a full-scale case study including five months of e-commerce transactions and made available by our industry partner, Worldline
Transfer Learning Strategies for Credit Card Fraud Detection.
Credit card fraud jeopardizes the trust of customers in e-commerce transactions. This led in recent years to major advances in the design of automatic Fraud Detection Systems (FDS) able to detect fraudulent transactions with short reaction time and high precision. Nevertheless, the heterogeneous nature of the fraud behavior makes it difficult to tailor existing systems to different contexts (e.g. new payment systems, different countries and/or population segments). Given the high cost (research, prototype development, and implementation in production) of designing data-driven FDSs, it is crucial for transactional companies to define procedures able to adapt existing pipelines to new challenges. From an AI/machine learning perspective, this is known as the problem of transfer learning. This paper discusses the design and implementation of transfer learning approaches for e-commerce credit card fraud detection and their assessment in a real setting. The case study, based on a six-month dataset (more than 200 million e-commerce transactions) provided by the industrial partner, relates to the transfer of detection models developed for a European country to another country. In particular, we present and discuss 15 transfer learning techniques (ranging from naive baselines to state-of-the-art and new approaches), making a critical and quantitative comparison in terms of precision for different transfer scenarios. Our contributions are twofold: (i) we show that the accuracy of many transfer methods is strongly dependent on the number of labeled samples in the target domain and (ii) we propose an ensemble solution to this problem based on self-supervised and semi-supervised domain adaptation classifiers. The thorough experimental assessment shows that this solution is both highly accurate and hardly sensitive to the number of labeled samples
Transfer learning for credit card fraud detection : A journey from research to production.
The dark face of digital commerce generalization is the increase of fraud attempts. To prevent any type of attacks, state-of-the-art fraud detection systems are now embedding Machine Learning (ML) modules. The conception of such modules is only communicated at the level of research and papers mostly focus on results for isolated benchmark datasets and metrics. But research is only a part of the journey, preceded by the right formulation of the business problem and collection of data, and followed by a practical integration. In this paper, we give a wider vision of the process, on a case study of transfer learning for fraud detection, from business to research, and back to business
Transfer learning for credit card fraud detection : A journey from research to production.
The dark face of digital commerce generalization is the increase of fraud attempts. To prevent any type of attacks, state-of-the-art fraud detection systems are now embedding Machine Learning (ML) modules. The conception of such modules is only communicated at the level of research and papers mostly focus on results for isolated benchmark datasets and metrics. But research is only a part of the journey, preceded by the right formulation of the business problem and collection of data, and followed by a practical integration. In this paper, we give a wider vision of the process, on a case study of transfer learning for fraud detection, from business to research, and back to business
Towards Refined Classifications Driven by SHAP Explanations
Machine Learning (ML) models are inherently approximate; as a result, the predictions of an ML model can be wrong. In applications where errors can jeopardize a company's reputation, human experts often have to manually check the alarms raised by the ML models by hand, as wrong or delayed decisions can have a significant business impact. These experts often use interpretable ML tools for the verification of predictions. However, post-prediction verification is also costly. In this paper, we hypothesize that the outputs of interpretable ML tools, such as SHAP explanations, can be exploited by machine learning techniques to improve classifier performance. By doing so, the cost of the post-prediction analysis can be reduced. To confirm our intuition, we conduct several experiments where we use SHAP explanations directly as new features. In particular, by considering nine datasets, we first compare the performance of these "SHAP features" against traditional "base features" on binary classification tasks. Then, we add a second-step classifier relying on SHAP features, with the goal of reducing false-positive and false-negative results of typical classifiers. We show that SHAP explanations used as SHAP features can help to improve classification performance, especially for false-negative reduction
Evaluating the Impact of Text De-Identification on Downstream NLP Tasks
peer reviewedData anonymisation is often required to comply with regulations when transfering information across departments or entities. However, the risk is that this procedure can distort the data and jeopardise the models built on it. Intuitively, the process of training an NLP model on anonymised data may lower the performance of the resulting model when compared to a model trained on non-anonymised data. In this paper, we investigate the impact of de-identification on the performance of nine downstream NLP tasks. We focus on the anonymisation and pseudonymisation of personal names and compare six different anonymisation strategies for two state-of-the-art pre-trained models. Based on these experiments, we formulate recommendations on how the de-identification should be performed to guarantee accurate NLP models. Our results reveal that de-identification does have a negative impact on the performance of NLP models, but this impact is relatively low. We also find that using pseudonymisation techniques involving random names leads to better performance across most tasks.Multilingual Nlp Coping With Luxembourg Specificities For The Financial Industr