Search CORE

3,469 research outputs found

Predictive Modelling of Retail Banking Transactions for Credit Scoring, Cross-Selling and Payment Pattern Discovery

Author: Harrach H.
Harrach H.
Publication venue: University of East London
Publication date: 01/01/2019
Field of study

Evaluating transactional payment behaviour offers a competitive advantage in the modern payment ecosystem, not only for confirming the presence of good credit applicants or unlocking the cross-selling potential between the respective product and service portfolios of financial institutions, but also to rule out bad credit applicants precisely in transactional payments streams. In a diagnostic test for analysing the payment behaviour, I have used a hybrid approach comprising a combination of supervised and unsupervised learning algorithms to discover behavioural patterns. Supervised learning algorithms can compute a range of credit scores and cross-sell candidates, although the applied methods only discover limited behavioural patterns across the payment streams. Moreover, the performance of the applied supervised learning algorithms varies across the different data models and their optimisation is inversely related to the pre-processed dataset. Subsequently, the research experiments conducted suggest that the Two-Class Decision Forest is an effective algorithm to determine both the cross-sell candidates and creditworthiness of their customers. In addition, a deep-learning model using neural network has been considered with a meaningful interpretation of future payment behaviour through categorised payment transactions, in particular by providing additional deep insights through graph-based visualisations. However, the research shows that unsupervised learning algorithms play a central role in evaluating the transactional payment behaviour of customers to discover associations using market basket analysis based on previous payment transactions, finding the frequent transactions categories, and developing interesting rules when each transaction category is performed on the same payment stream. Current research also reveals that the transactional payment behaviour analysis is multifaceted in the financial industry for assessing the diagnostic ability of promotion candidates and classifying bad credit applicants from among the entire customer base. The developed predictive models can also be commonly used to estimate the credit risk of any credit applicant based on his/her transactional payment behaviour profile, combined with deep insights from the categorised payment transactions analysis. The research study provides a full review of the performance characteristic results from different developed data models. Thus, the demonstrated data science approach is a possible proof of how machine learning models can be turned into cost-sensitive data models

UEL Research Repository at University of East London

Using rule extraction to improve the comprehensibility of predictive models.

Author: Baesens Bart
Huysmans Johan
Vanthienen Jan
Publication venue
Publication date
Field of study

Whereas newer machine learning techniques, like artifficial neural net-works and support vector machines, have shown superior performance in various benchmarking studies, the application of these techniques remains largely restricted to research environments. A more widespread adoption of these techniques is foiled by their lack of explanation capability which is required in some application areas, like medical diagnosis or credit scoring. To overcome this restriction, various algorithms have been proposed to extract a meaningful description of the underlying `blackbox' models. These algorithms' dual goal is to mimic the behavior of the black box as closely as possible while at the same time they have to ensure that the extracted description is maximally comprehensible. In this research report, we first develop a formal definition of`rule extraction and comment on the inherent trade-off between accuracy and comprehensibility. Afterwards, we develop a taxonomy by which rule extraction algorithms can be classiffied and discuss some criteria by which these algorithms can be evaluated. Finally, an in-depth review of the most important algorithms is given.This report is concluded by pointing out some general shortcomings of existing techniques and opportunities for future research.Models; Model; Algorithms; Criteria; Opportunities; Research; Learning; Neural networks; Networks; Performance; Benchmarking; Studies; Area; Credit; Credit scoring; Behavior; Time;

Research Papers in Economics

Explainable AI in Fintech and Insurtech

Author: GRAMEGNA ALEX
Publication venue: Università degli studi di Pavia
Publication date: 20/06/2023
Field of study

The growing application of black-box Artificial Intelligence algorithms in many real-world application is raising the importance of understanding how models make their decision. The research field that aims to look into the inner workings of the black-box and to make predictions more interpretable is referred to as eXplainable Artificial Intelligence (XAI). Over the recent years, the research domain of XAI has seen important contributions and continuous developments, achieving great results with theoretically sound applied methodologies. These achievements enable both industry and regulators to improve on existing models and their supervision; this is done in term of explainability, which is the main purpose of these models, but it also brings new possibilities, namely the employment of eXplainable AI models and their outputs as an intermediate step to new applications, greatly expanding their usefulness beyond explainability of model decisions. This thesis is composed of six chapters: an introduction and a conclusion plus four self contained sections reporting the corresponding papers. Chapter 1 proposes the use of Shapley values in similarity networks and clustering models in order to bring out new pieces of information, useful for classification and analysis of the customer base, in an insurtech setting. In chapter 2 a comparison between SHAP and LIME, two of the most important XAI models, evaluating their parameters attribution methodologies and the information they are capable of include thereof, in italian Small and Medium Enterprises’ Probability of Default (PD) estimation, with balance sheet data as inputs. Chapter 3 introduces the use of Shapley values in feature selection techniques, with the analysis of wrapper and embedded feature selection algorithms and their ability to select relevant features with both raw data and their Shapley values, again in the setting of SME PD estimation. In chapter 4, a new methodology of model selection based on Lorenz Zoonoid is introduced, highlighting similarities with the game-theoretical concept of Shapley values and their variability decomposition attribution to independent variables as well as some advantages in terms of model comparability and standardization. These properties are explored through both a simulated example and the application to a real world dataset, provided by EU-certified rating agency Modefinance.The growing application of black-box Artificial Intelligence algorithms in many real-world application is raising the importance of understanding how models make their decision. The research field that aims to look into the inner workings of the black-box and to make predictions more interpretable is referred to as eXplainable Artificial Intelligence (XAI). Over the recent years, the research domain of XAI has seen important contributions and continuous developments, achieving great results with theoretically sound applied methodologies. These achievements enable both industry and regulators to improve on existing models and their supervision; this is done in term of explainability, which is the main purpose of these models, but it also brings new possibilities, namely the employment of eXplainable AI models and their outputs as an intermediate step to new applications, greatly expanding their usefulness beyond explainability of model decisions. This thesis is composed of six chapters: an introduction and a conclusion plus four self contained sections reporting the corresponding papers. Chapter 1 proposes the use of Shapley values in similarity networks and clustering models in order to bring out new pieces of information, useful for classification and analysis of the customer base, in an insurtech setting. In chapter 2 a comparison between SHAP and LIME, two of the most important XAI models, evaluating their parameters attribution methodologies and the information they are capable of include thereof, in italian Small and Medium Enterprises’ Probability of Default (PD) estimation, with balance sheet data as inputs. Chapter 3 introduces the use of Shapley values in feature selection techniques, with the analysis of wrapper and embedded feature selection algorithms and their ability to select relevant features with both raw data and their Shapley values, again in the setting of SME PD estimation. In chapter 4, a new methodology of model selection based on Lorenz Zoonoid is introduced, highlighting similarities with the game-theoretical concept of Shapley values and their variability decomposition attribution to independent variables as well as some advantages in terms of model comparability and standardization. These properties are explored through both a simulated example and the application to a real world dataset, provided by EU-certified rating agency Modefinance

Archivio Istituzionale della Ricerca - Università degli Studi di Pavia

Credit risk prediction in an imbalanced social lending environment

Author: Naderpour M
Namvar A
Rabhi F
Siami M
Publication venue: 'Atlantis Press'
Publication date: 01/01/2018
Field of study

© 2018, the Authors. Credit risk prediction is an effective way of evaluating whether a potential borrower will repay a loan, particularly in peer-to-peer lending where class imbalance problems are prevalent. However, few credit risk prediction models for social lending consider imbalanced data and, further, the best resampling technique to use with imbalanced data is still controversial. In an attempt to address these problems, this paper presents an empirical comparison of various combinations of classifiers and resampling techniques within a novel risk assessment methodology that incorporates imbalanced data. The credit predictions from each combination are evaluated with a G-mean measure to avoid bias towards the majority class, which has not been considered in similar studies. The results reveal that combining random forest and random under-sampling may be an effective strategy for calculating the credit risk associated with loan applicants in social lending markets

arXiv.org e-Print Archive

OPUS - University of Technology Sydney

Directory of Open Access Journals

UNSWorks