636 research outputs found

    A literature review on the application of evolutionary computing to credit scoring

    Get PDF
    The last years have seen the development of many credit scoring models for assessing the creditworthiness of loan applicants. Traditional credit scoring methodology has involved the use of statistical and mathematical programming techniques such as discriminant analysis, linear and logistic regression, linear and quadratic programming, or decision trees. However, the importance of credit grant decisions for financial institutions has caused growing interest in using a variety of computational intelligence techniques. This paper concentrates on evolutionary computing, which is viewed as one of the most promising paradigms of computational intelligence. Taking into account the synergistic relationship between the communities of Economics and Computer Science, the aim of this paper is to summarize the most recent developments in the application of evolutionary algorithms to credit scoring by means of a thorough review of scientific articles published during the period 2000–2012.This work has partially been supported by the Spanish Ministry of Education and Science under grant TIN2009-14205 and the Generalitat Valenciana under grant PROMETEO/2010/028

    Efficient Multi-Classifier Wrapper Feature-Selection Model: Application for Dimension Reduction in Credit Scoring

    Get PDF
    The task of identifying most relevant features for a credit scoring application is a challenging task. Reducing the number of redundant and unwanted features is an inevitable task to improve the performance of the credit scoring model. The wrappers approach is usually used in credit scoring applications to identify the most relevant features. However, this approach suffers from the issue of subsets generation and the use of a single classifier as an evaluation function. The problem here is that each classifier may give different results which can be interpreted differently. Hence, we propose in this study an ensemble wrapper feature selection model which is based on a multi-classifiers combination. In a first stage, we address the problem of subsets generation by minimizing the search space through a customized heuristic. Then, a multi-classifier wrapper evaluation is applied using two classifier arrangement approaches in order to select a set of mutually approved set of relevant features. The proposed method is evaluated on four credit datasets and has shown a good performance compared to individual classifiers results

    Adaptive Modelling Approach for Row-Type Dependent Predictive Analysis (RTDPA): A Framework for Designing Machine Learning Models for Credit Risk Analysis in Banking Sector

    Full text link
    In many real-world datasets, rows may have distinct characteristics and require different modeling approaches for accurate predictions. In this paper, we propose an adaptive modeling approach for row-type dependent predictive analysis(RTDPA). Our framework enables the development of models that can effectively handle diverse row types within a single dataset. Our dataset from XXX bank contains two different risk categories, personal loan and agriculture loan. each of them are categorised into four classes standard, sub-standard, doubtful and loss. We performed tailored data pre processing and feature engineering to different row types. We selected traditional machine learning predictive models and advanced ensemble techniques. Our findings indicate that all predictive approaches consistently achieve a precision rate of no less than 90%. For RTDPA, the algorithms are applied separately for each row type, allowing the models to capture the specific patterns and characteristics of each row type. This approach enables targeted predictions based on the row type, providing a more accurate and tailored classification for the given dataset.Additionally, the suggested model consistently offers decision makers valuable and enduring insights that are strategic in nature in banking sector

    Supervised Classification and Mathematical Optimization

    Get PDF
    Data Mining techniques often ask for the resolution of optimization problems. Supervised Classification, and, in particular, Support Vector Machines, can be seen as a paradigmatic instance. In this paper, some links between Mathematical Optimization methods and Supervised Classification are emphasized. It is shown that many different areas of Mathematical Optimization play a central role in off-the-shelf Supervised Classification methods. Moreover, Mathematical Optimization turns out to be extremely useful to address important issues in Classification, such as identifying relevant variables, improving the interpretability of classifiers or dealing with vagueness/noise in the data

    Prediction of delayed graft function after kidney transplantation : comparison between logistic regression and machine learning methods

    Get PDF
    Background: Predictive models for delayed graft function (DGF) after kidney transplantation are usually developed using logistic regression. We want to evaluate the value of machine learning methods in the prediction of DGF. Methods: 497 kidney transplantations from deceased donors at the Ghent University Hospital between 2005 and 2011 are included. A feature elimination procedure is applied to determine the optimal number of features, resulting in 20 selected parameters (24 parameters after conversion to indicator parameters) out of 55 retrospectively collected parameters. Subsequently, 9 distinct types of predictive models are fitted using the reduced data set: logistic regression (LR), linear discriminant analysis (LDA), quadratic discriminant analysis (QDA), support vector machines (SVMs; using linear, radial basis function and polynomial kernels), decision tree (DT), random forest (RF), and stochastic gradient boosting (SGB). Performance of the models is assessed by computing sensitivity, positive predictive values and area under the receiver operating characteristic curve (AUROC) after 10-fold stratified cross-validation. AUROCs of the models are pairwise compared using Wilcoxon signed-rank test. Results: The observed incidence of DGF is 12.5 %. DT is not able to discriminate between recipients with and without DGF (AUROC of 52.5 %) and is inferior to the other methods. SGB, RF and polynomial SVM are mainly able to identify recipients without DGF (AUROC of 77.2, 73.9 and 79.8 %, respectively) and only outperform DT. LDA, QDA, radial SVM and LR also have the ability to identify recipients with DGF, resulting in higher discriminative capacity (AUROC of 82.2, 79.6, 83.3 and 81.7 %, respectively), which outperforms DT and RF. Linear SVM has the highest discriminative capacity (AUROC of 84.3 %), outperforming each method, except for radial SVM, polynomial SVM and LDA. However, it is the only method superior to LR. Conclusions: The discriminative capacities of LDA, linear SVM, radial SVM and LR are the only ones above 80 %. None of the pairwise AUROC comparisons between these models is statistically significant, except linear SVM outperforming LR. Additionally, the sensitivity of linear SVM to identify recipients with DGF is amongst the three highest of all models. Due to both reasons, the authors believe that linear SVM is most appropriate to predict DGF

    Supervised classification and mathematical optimization

    Get PDF
    Data Mining techniques often ask for the resolution of optimization problems. Supervised Classification, and, in particular, Support Vector Machines, can be seen as a paradigmatic instance. In this paper, some links between Mathematical Optimization methods and Supervised Classification are emphasized. It is shown that many different areas of Mathematical Optimization play a central role in off-the-shelf Supervised Classification methods. Moreover, Mathematical Optimization turns out to be extremely useful to address important issues in Classification, such as identifying relevant variables, improving the interpretability of classifiers or dealing with vagueness/noise in the data.Ministerio de Ciencia e InnovaciónJunta de Andalucí

    Modeling, forecasting and trading the EUR exchange rates with hybrid rolling genetic algorithms: support vector regression forecast combinations

    Get PDF
    The motivation of this paper is to introduce a hybrid Rolling Genetic Algorithm-Support Vector Regression (RG-SVR) model for optimal parameter selection and feature subset combination. The algorithm is applied to the task of forecasting and trading the EUR/USD, EUR/GBP and EUR/JPY exchange rates. The proposed methodology genetically searches over a feature space (pool of individual forecasts) and then combines the optimal feature subsets (SVR forecast combinations) for each exchange rate. This is achieved by applying a fitness function specialized for financial purposes and adopting a sliding window approach. The individual forecasts are derived from several linear and non-linear models. RG-SVR is benchmarked against genetically and non-genetically optimized SVRs and SVMs models that are dominating the relevant literature, along with the robust ARBF-PSO neural network. The statistical and trading performance of all models is investigated during the period of 1999–2012. As it turns out, RG-SVR presents the best performance in terms of statistical accuracy and trading efficiency for all the exchange rates under study. This superiority confirms the success of the implemented fitness function and training procedure, while it validates the benefits of the proposed algorithm

    A systematic credit scoring model based on heterogeneous classifier ensembles

    Get PDF
    Lending loans to borrowers is considered one of the main profit sources for banks and financial institutions. Thus, careful assessment and evaluation should be taken when deciding to grant credit to potential borrowers. With the rapid growth of credit industry and the massive volume of financial data, developing effective credit scoring models is very crucial. The literature in this area is very dense with models that aim to get the best predictive performance. Recent studies stressed on using ensemble models or multiple classifiers over single ones to solve credit scoring problems. Therefore, this study propose to develop and introduce a systematic credit scoring model based on homogenous and heterogeneous classifier ensembles based on three state-of-the art classifiers: logistic regression (LR), artificial neural network (ANN) and support vector machines (SVM). Results revealed that heterogeneous classifier ensembles gives better predictive performance than homogenous and single classifiers in terms of average accuracy

    Using neural networks and support vector machines for default prediction in South Africa

    Get PDF
    A thesis submitted to the Faculty of Computer Science and Applied Mathematics, University of Witwatersrand, in fulfillment of the requirements for the Master of Science (MSc) Johannesburg Feb 2017This is a thesis on credit risk and in particular bankruptcy prediction. It investigates the application of machine learning techniques such as support vector machines and neural networks for this purpose. This is not a thesis on support vector machines and neural networks, it simply looks at using these functions as tools to preform the analysis. Neural networks are a type of machine learning algorithm. They are nonlinear mod- els inspired from biological network of neurons found in the human central nervous system. They involve a cascade of simple nonlinear computations that when aggre- gated can implement robust and complex nonlinear functions. Neural networks can approximate most nonlinear functions, making them a quite powerful class of models. Support vector machines (SVM) are the most recent development from the machine learning community. In machine learning, support vector machines (SVMs) are su- pervised learning algorithms that analyze data and recognize patterns, used for clas- si cation and regression analysis. SVM takes a set of input data and predicts, for each given input, which of two possible classes comprises the input, making the SVM a non-probabilistic binary linear classi er. A support vector machine constructs a hyperplane or set of hyperplanes in a high or in nite dimensional space, which can be used for classi cation into the two di erent data classes. Traditional bankruptcy prediction medelling has been criticised as it makes certain underlying assumptions on the underlying data. For instance, a frequent requirement for multivarate analysis is a joint normal distribution and independence of variables. Support vector machines (and neural networks) are a useful tool for default analysis because they make far fewer assumptions on the underlying data. In this framework support vector machines are used as a classi er to discriminate defaulting and non defaulting companies in a South African context. The input data required is a set of nancial ratios constructed from the company's historic nancial statements. The data is then Divided into the two groups: a company that has defaulted and a company that is healthy (non default). The nal data sample used for this thesis consists of 23 nancial ratios from 67 companies listed on the jse. Furthermore for each company the company's probability of default is predicted. The results are benchmarked against more classical methods that are commonly used for bankruptcy prediction such as linear discriminate analysis and logistic regression. Then the results of the support vector machines, neural networks, linear discriminate analysis and logistic regression are assessed via their receiver operator curves and pro tability ratios to gure out which model is more successful at predicting default.MT 201
    corecore