895 research outputs found
Feature selection in credit risk modeling: an international evidence
This paper aims to discover a suitable combination of contemporary feature selection techniques and robust prediction classifiers.
As such, to examine the impact of the feature selection method
on classifier performance, we use two Chinese and three other
real-world credit scoring datasets. The utilized feature selection
methods are the least absolute shrinkage and selection operator
(LASSO), multivariate adaptive regression splines (MARS). In contrast, the examined classifiers are the classification and regression
trees (CART), logistic regression (LR), artificial neural network
(ANN), and support vector machines (SVM). Empirical findings
confirm that LASSO’s feature selection method, followed by
robust classifier SVM, demonstrates remarkable improvement and
outperforms other competitive classifiers. Moreover, ANN also
offers improved accuracy with feature selection methods; LR only
can improve classification efficiency through performing feature
selection via LASSO. Nonetheless, CART does not provide any
indication of improvement in any combination. The proposed
credit scoring modeling strategy may use to develop policy, progressive ideas, operational guidelines for effective credit risk management of lending, and other financial institutions. The finding
of this study has practical value, as to date, there is no consensus
about the combination of feature selection method and prediction classifiers
Improved credit scoring model using XGBoost with Bayesian hyper-parameter optimization
Several credit-scoring models have been developed using ensemble classifiers in order to improve the accuracy of assessment. However, among the ensemble models, little consideration has been focused on the hyper-parameters tuning of base learners, although these are crucial to constructing ensemble models. This study proposes an improved credit scoring model based on the extreme gradient boosting (XGB) classifier using Bayesian hyper-parameters optimization (XGB-BO). The model comprises two steps. Firstly, data pre-processing is utilized to handle missing values and scale the data. Secondly, Bayesian hyper-parameter optimization is applied to tune the hyper-parameters of the XGB classifier and used to train the model. The model is evaluated on four widely public datasets, i.e., the German, Australia, lending club, and Polish datasets. Several state-of-the-art classification algorithms are implemented for predictive comparison with the proposed method. The results of the proposed model showed promising results, with an improvement in accuracy of 4.10%, 3.03%, and 2.76% on the German, lending club, and Australian datasets, respectively. The proposed model outperformed commonly used techniques, e.g., decision tree, support vector machine, neural network, logistic regression, random forest, and bagging, according to the evaluation results. The experimental results confirmed that the XGB-BO model is suitable for assessing the creditworthiness of applicants
Forecasting Financial Distress With Machine Learning – A Review
Purpose – Evaluate the various academic researches with multiple views on credit risk and artificial intelligence (AI) and their evolution.Theoretical framework – The study is divided as follows: Section 1 introduces the article. Section 2 deals with credit risk and its relationship with computational models and techniques. Section 3 presents the methodology. Section 4 addresses a discussion of the results and challenges on the topic. Finally, section 5 presents the conclusions.Design/methodology/approach – A systematic review of the literature was carried out without defining the time period and using the Web of Science and Scopus database.Findings – The application of computational technology in the scope of credit risk analysis has drawn attention in a unique way. It was found that the demand for identification and introduction of new variables, classifiers and more assertive methods is constant. The effort to improve the interpretation of data and models is intense.Research, Practical & Social implications – It contributes to the verification of the theory, providing information in relation to the most used methods and techniques, it brings a wide analysis to deepen the knowledge of the factors and variables on the theme. It categorizes the lines of research and provides a summary of the literature, which serves as a reference, in addition to suggesting future research.Originality/value – Research in the area of Artificial Intelligence and Machine Learning is recent and requires attention and investigation, thus, this study contributes to the opening of new views in order to deepen the work on this topic
Machine learning applied to banking supervision a literature review
Guerra, P., & Castelli, M. (2021). Machine learning applied to banking supervision a literature review. Risks, 9(7), 1-24. [136]. https://doi.org/10.3390/risks9070136Machine learning (ML) has revolutionised data analysis over the past decade. Like in-numerous other industries heavily reliant on accurate information, banking supervision stands to benefit greatly from this technological advance. The objective of this review is to provide a compre-hensive walk-through of how the most common ML techniques have been applied to risk assessment in banking, focusing on a supervisory perspective. We searched Google Scholar, Springer Link, and ScienceDirect databases for articles including the search terms “machine learning” and (“bank” or “banking” or “supervision”). No language, date, or Journal filter was applied. Papers were then screened and selected according to their relevance. The final article base consisted of 41 papers and 2 book chapters, 53% of which were published in the top quartile journals in their field. Results are presented in a timeline according to the publication date and categorised by time slots. Credit risk assessment and stress testing are highlighted topics as well as other risk perspectives, with some references to ML application surveys. The most relevant ML techniques encompass k-nearest neigh-bours (KNN), support vector machines (SVM), tree-based models, ensembles, boosting techniques, and artificial neural networks (ANN). Recent trends include developing early warning systems (EWS) for bankruptcy and refining stress testing. One limitation of this study is the paucity of contributions using supervisory data, which justifies the need for additional investigation in this field. However, there is increasing evidence that ML techniques can enhance data analysis and decision making in the banking industry.publishersversionpublishe
Credit risk prediction in an imbalanced social lending environment
© 2018, the Authors. Credit risk prediction is an effective way of evaluating whether a potential borrower will repay a loan, particularly in peer-to-peer lending where class imbalance problems are prevalent. However, few credit risk prediction models for social lending consider imbalanced data and, further, the best resampling technique to use with imbalanced data is still controversial. In an attempt to address these problems, this paper presents an empirical comparison of various combinations of classifiers and resampling techniques within a novel risk assessment methodology that incorporates imbalanced data. The credit predictions from each combination are evaluated with a G-mean measure to avoid bias towards the majority class, which has not been considered in similar studies. The results reveal that combining random forest and random under-sampling may be an effective strategy for calculating the credit risk associated with loan applicants in social lending markets
Recommended from our members
A credit scoring model based on classifiers consensus system approach
This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University London.Managing customer credit is an important issue for each commercial bank; therefore, banks take great care when dealing with customer loans to avoid any improper decisions that can lead to loss of opportunity or financial losses. The manual estimation of customer creditworthiness has become both time- and resource-consuming. Moreover, a manual approach is subjective (dependable on the bank employee who gives this estimation), which is why devising and implementing programming models that provide loan estimations is the only way of eradicating the ‘human factor’ in this problem. This model should give recommendations to the bank in terms of whether or not a loan should be given, or otherwise can give a probability in relation to whether the loan will be returned. Nowadays, a number of models have been designed, but there is no ideal classifier amongst these models since each gives some percentage of incorrect outputs; this is a critical consideration when each percent of incorrect answer can mean millions of dollars of losses for large banks. However, the LR remains the industry standard tool for credit-scoring models development. For this purpose, an investigation is carried out on the combination of the most efficient classifiers in credit-scoring scope in an attempt to produce a classifier that exceeds each of its classifiers or components. In this work, a fusion model referred to as ‘the Classifiers Consensus Approach’ is developed, which gives a lot better performance than each of single classifiers that constitute it. The difference of the consensus approach and the majority of other combiners lie in the fact that the consensus approach adopts the model of real expert group behaviour during the process of finding the consensus (aggregate) answer. The consensus model is compared not only with single classifiers, but also with traditional combiners and a quite complex combiner model known as the ‘Dynamic Ensemble Selection’ approach. As a pre-processing technique, step data-filtering (select training entries which fits input data well and remove outliers and noisy data) and feature selection (remove useless and statistically insignificant features which values are low correlated with real quality of loan) are used. These techniques are valuable in significantly improving the consensus approach results. Results clearly show that the consensus approach is statistically better (with 95% confidence value, according to Friedman test) than any other single classifier or combiner analysed; this means that for similar datasets, there is a 95% guarantee that the consensus approach will outperform all other classifiers. The consensus approach gives not only the best accuracy, but also better AUC value, Brier score and H-measure for almost all datasets investigated in this thesis. Moreover, it outperformed Logistic Regression. Thus, it has been proven that the use of the consensus approach for credit-scoring is justified and recommended in commercial banks. Along with the consensus approach, the dynamic ensemble selection approach is analysed, the results of which show that, under some conditions, the dynamic ensemble selection approach can rival the consensus approach. The good sides of dynamic ensemble selection approach include its stability and high accuracy on various datasets.
The consensus approach, which is improved in this work, may be considered in banks that hold the same characteristics of the datasets used in this work, where utilisation could decrease the level of mistakenly rejected loans of solvent customers, and the level of mistakenly accepted loans that are never to be returned. Furthermore, the consensus approach is a notable step in the direction of building a universal classifier that can fit data with any structure. Another advantage of the consensus approach is its flexibility; therefore, even if the input data is changed due to various reasons, the consensus approach can be easily re-trained and used with the same performance
Recommended from our members
Classifiers consensus system approach for credit scoring
Banks take great care when dealing with customer loans to avoid any improper decisions that can lead to loss of opportunity or financial losses. Regarding this, researchers have developed complex credit scoring models using statistical and artificial intelligence (AI) techniques to help banks and financial institutions to support their financial decisions. Various models, from easy to advanced approaches, have been developed in this domain. However, during the last few years there has been marked attention towards development of ensemble or multiple classifier systems, which have proved their ability to be more accurate than single classifier models. However, among the multiple classifier systems models developed in the literature, there has been little consideration given to: 1) combining classifiers of different algorithms (as most have focused on building classifiers of the same algorithm); or 2) exploring different classifier output combination techniques other than the traditional ones, such as majority voting and weighted average. In this paper, the aim is to present a new combination approach based on classifier consensus to combine multiple classifier systems (MCS) of different classification algorithms. Specifically, six of the main well-known base classifiers in this domain are used, namely, logistic regression (LR), neural networks (NN), support vector machines (SVM), random forests (RF), decision trees (DT) and naïve Bayes (NB). Two benchmark classifiers are considered as a reference point for comparison with the proposed method and the other classifiers. These are used in combination with LR, which is still considered the industry-standard model for credit scoring models, and multivariate adaptive regression splines (MARS), a widely adopted technique in credit scoring studies. The experimental results, analysis and statistical tests demonstrate the ability of the proposed combination method to improve prediction performance against all base classifiers, namely, LR, MARS and seven traditional combination methods, in terms of average accuracy, area under the curve (AUC), the H-measure and Brier score (BS). The model was validated over five real-world credit scoring datasets
A multimodal neuroimaging classifier for alcohol dependence
With progress in magnetic resonance imaging technology and a broader dissemination of state-of-the-art imaging facilities, the acquisition of multiple neuroimaging modalities is becoming increasingly feasible. One particular hope associated with multimodal neuroimaging is the development of reliable data-driven diagnostic classifiers for psychiatric disorders, yet previous studies have often failed to find a benefit of combining multiple modalities. As a psychiatric disorder with established neurobiological effects at several levels of description, alcohol dependence is particularly well-suited for multimodal classification. To this aim, we developed a multimodal classification scheme and applied it to a rich neuroimaging battery (structural, functional task-based and functional resting-state data) collected in a matched sample of alcohol-dependent patients (N = 119) and controls (N = 97). We found that our classification scheme yielded 79.3% diagnostic accuracy, which outperformed the strongest individual modality - grey-matter density - by 2.7%. We found that this moderate benefit of multimodal classification depended on a number of critical design choices: a procedure to select optimal modality-specific classifiers, a fine-grained ensemble prediction based on cross-modal weight matrices and continuous classifier decision values. We conclude that the combination of multiple neuroimaging modalities is able to moderately improve the accuracy of machine-learning-based diagnostic classification in alcohol dependence
A multimodal neuroimaging classifier for alcohol dependence
With progress in magnetic resonance imaging technology and a broader dissemination of state-of-the-art imaging facilities, the acquisition of multiple neuroimaging modalities is becoming increasingly feasible. One particular hope associated with multimodal neuroimaging is the development of reliable data-driven diagnostic classifiers for psychiatric disorders, yet previous studies have often failed to find a benefit of combining multiple modalities. As a psychiatric disorder with established neurobiological effects at several levels of description, alcohol dependence is particularly well-suited for multimodal classification. To this aim, we developed a multimodal classification scheme and applied it to a rich neuroimaging battery (structural, functional task-based and functional resting-state data) collected in a matched sample of alcohol-dependent patients (N = 119) and controls (N = 97). We found that our classification scheme yielded 79.3% diagnostic accuracy, which outperformed the strongest individual modality - grey-matter density - by 2.7%. We found that this moderate benefit of multimodal classification depended on a number of critical design choices: a procedure to select optimal modality-specific classifiers, a fine-grained ensemble prediction based on cross-modal weight matrices and continuous classifier decision values. We conclude that the combination of multiple neuroimaging modalities is able to moderately improve the accuracy of machine-learning-based diagnostic classification in alcohol dependence
- …