45 research outputs found

    Does segmentation always improve model performance in credit scoring?

    No full text
    Credit scoring allows for the credit risk assessment of bank customers. A single scoring model (scorecard) can be developed for the entire customer population, e.g. using logistic regression. However, it is often expected that segmentation, i.e. dividing the population into several groups and building separate scorecards for them, will improve the model performance. The most common statistical methods for segmentation are the two-step approaches, where logistic regression follows Classification and Regression Trees (CART) or Chi-squared Automatic Interaction Detection (CHAID) trees etc. In this research, the two-step approaches are applied as well as a new, simultaneous method, in which both segmentation and scorecards are optimised at the same time: Logistic Trees with Unbiased Selection (LOTUS). For reference purposes, a single-scorecard model is used. The above-mentioned methods are applied to the data provided by two of the major UK banks and one of the European credit bureaus. The model performance measures are then compared to examine whether there is improvement due to the segmentation methods used. It is found that segmentation does not always improve model performance in credit scoring: for none of the analysed real-world datasets, the multi-scorecard models perform considerably better than the single-scorecard ones. Moreover, in this application, there is no difference in performance between the two-step and simultaneous approache

    CVXR: An R Package for Disciplined Convex Optimization

    Get PDF
    CVXR is an R package that provides an object-oriented modeling language for convex optimization, similar to CVX, CVXPY, YALMIP, and Convex.jl. It allows the user to formulate convex optimization problems in a natural mathematical syntax rather than the restrictive form required by most solvers. The user specifies an objective and set of constraints by combining constants, variables, and parameters using a library of functions with known mathematical properties. CVXR then applies signed disciplined convex programming (DCP) to verify the problem's convexity. Once verified, the problem is converted into standard conic form using graph implementations and passed to a cone solver such as ECOS or SCS. We demonstrate CVXR's modeling framework with several applications.Comment: 34 pages, 9 figure

    Credit scoring using the clustered support vector machine

    Get PDF
    This work investigates the practice of credit scoring and introduces the use of the clustered support vector machine (CSVM) for credit scorecard development. This recently designed algorithm addresses some of the limitations noted in the literature that is associated with traditional nonlinear support vector machine (SVM) based methods for classification. Specifically, it is well known that as historical credit scoring datasets get large, these nonlinear approaches while highly accurate become computationally expensive. Accordingly, this study compares the CSVM with other nonlinear SVM based techniques and shows that the CSVM can achieve comparable levels of classification performance while remaining relatively cheap computationally

    A New Credit Scoring Model For Vehicle Leasing Company

    Get PDF
    Usaha kecil dan menengah menjadi salah satu bisnis yang terdampak akibat penyebaran virus corona. Situasi pandemi di Indonesia menyebabkan penderitaan besar pada perusahaan-perusahaan ini. Untuk mencegah kerugian di masa pandemi saat ini. PT XYZ memutuskan untuk membuat model penilaian kredit untuk memprediksi risiko dari calon pelanggan mereka. Model akan terdiri dari dua jenis. Yang pertama adalah penilaian atau kartu skor sistem pakar. Data yang diperoleh dari sistem pakar nantinya akan dimasukkan ke dalam machine learning menggunakan metode statistik untuk mendapatkan model credit scoring. Kerangka kerja CRISP-DM akan digunakan untuk memandu proses pembuatan untuk memastikan keluaran model yang andal

    Using Machine Learning Techniques to Predict a Risk Score for New Members of a Chit Fund Group

    Get PDF
    Predicting the risk score of new and potential customers is used across the financial industry. By implementing the prediction of risk scores for their customers a chit fund company can improve the knowledge and customer understanding without relying on human knowledge. Data is collected on each customer before they have taken out credit and during the time they contribute to a chit fund. Having collected the necessary data, the company can then decide whether modelling customer risk would benefit them. As the data is available historically, one aspect of risk score prediction will be the focus of this thesis, supervised machine learning. Supervised machine learning techniques use historic data to ‘learn a model of the relationship between a set of descriptive features and a target feature’ (Kelleher, Mac Namee, & D’Arcy, 2015). There are many supervised machine learning techniques; support vector machine (SVM), logistic regression and decision trees will be the focal point of this thesis. The main objective of this project attempts to predict a risk score for new or potential subscribers of a chit fund company. The models generated would be suitable for use before a customer joins a chit fund group as well as while the customer is taking part in the group, measuring risk before becoming a subscriber and the behavioural risk while with the company. The objective is to extend research already carried out to predict a score from zero to one identifying the probability of default. Default, for the purpose of this project, is defined as being more than 90 days late with a payment. The data of real chit fund subscribers was used to train and test the models built for the project. A factor reduction technique was used to identify key variables, and multiple models were tested to determine which gives the best results. The second objective of this project will look at the subscriber network. This section of the project will check for links between subscribers, and investigate a possible link between subscribers and their chance of default. Variables such as address and nominee will be the focus in this section. iii The most successful supervised machine learning model was the random forest model with precision of 59% and recall of 92%. Accuracy for this model was the highest of each of the models in the experiment at 85%. However, this is not the most trustworthy evaluation measure for this project as the dataset is unbalanced. A combination of 300 decision trees were applied in this model. Using the classification method, the class that was predicted by the majority of trees was selected as the final prediction. This achieved high accuracy of the dataset from the chit fund company, Kyepot. Social network analysis found that there was no unusual relationship between subscribers that went into default with regards to the area in which they live or their nominees. Supervised machine learning techniques have been shown to be a useful tool in the financial industry. This project suggests that these techniques may also be useful tools for chit fund companies. This project evaluates four different techniques suggesting the random forest technique is the most useful for this chit fund company

    GraphCast: Learning skillful medium-range global weather forecasting

    Full text link
    We introduce a machine-learning (ML)-based weather simulator--called "GraphCast"--which outperforms the most accurate deterministic operational medium-range weather forecasting system in the world, as well as all previous ML baselines. GraphCast is an autoregressive model, based on graph neural networks and a novel high-resolution multi-scale mesh representation, which we trained on historical weather data from the European Centre for Medium-Range Weather Forecasts (ECMWF)'s ERA5 reanalysis archive. It can make 10-day forecasts, at 6-hour time intervals, of five surface variables and six atmospheric variables, each at 37 vertical pressure levels, on a 0.25-degree latitude-longitude grid, which corresponds to roughly 25 x 25 kilometer resolution at the equator. Our results show GraphCast is more accurate than ECMWF's deterministic operational forecasting system, HRES, on 90.0% of the 2760 variable and lead time combinations we evaluated. GraphCast also outperforms the most accurate previous ML-based weather forecasting model on 99.2% of the 252 targets it reported. GraphCast can generate a 10-day forecast (35 gigabytes of data) in under 60 seconds on Cloud TPU v4 hardware. Unlike traditional forecasting methods, ML-based forecasting scales well with data: by training on bigger, higher quality, and more recent data, the skill of the forecasts can improve. Together these results represent a key step forward in complementing and improving weather modeling with ML, open new opportunities for fast, accurate forecasting, and help realize the promise of ML-based simulation in the physical sciences.Comment: Main text: 21 pages, 8 figures, 1 table. Appendix: 15 pages, 5 figures, 2 table

    Credit Scoring Using Machine Learning

    Get PDF
    For financial institutions and the economy at large, the role of credit scoring in lending decisions cannot be overemphasised. An accurate and well-performing credit scorecard allows lenders to control their risk exposure through the selective allocation of credit based on the statistical analysis of historical customer data. This thesis identifies and investigates a number of specific challenges that occur during the development of credit scorecards. Four main contributions are made in this thesis. First, we examine the performance of a number supervised classification techniques on a collection of imbalanced credit scoring datasets. Class imbalance occurs when there are significantly fewer examples in one or more classes in a dataset compared to the remaining classes. We demonstrate that oversampling the minority class leads to no overall improvement to the best performing classifiers. We find that, in contrast, adjusting the threshold on classifier output yields, in many cases, an improvement in classification performance. Our second contribution investigates a particularly severe form of class imbalance, which, in credit scoring, is referred to as the low-default portfolio problem. To address this issue, we compare the performance of a number of semi-supervised classification algorithms with that of logistic regression. Based on the detailed comparison of classifier performance, we conclude that both approaches merit consideration when dealing with low-default portfolios. Third, we quantify the differences in classifier performance arising from various implementations of a real-world behavioural scoring dataset. Due to commercial sensitivities surrounding the use of behavioural scoring data, very few empirical studies which directly address this topic are published. This thesis describes the quantitative comparison of a range of dataset parameters impacting classification performance, including: (i) varying durations of historical customer behaviour for model training; (ii) different lengths of time from which a borrower’s class label is defined; and (iii) using alternative approaches to define a customer’s default status in behavioural scoring. Finally, this thesis demonstrates how artificial data may be used to overcome the difficulties associated with obtaining and using real-world data. The limitations of artificial data, in terms of its usefulness in evaluating classification performance, are also highlighted. In this work, we are interested in generating artificial data, for credit scoring, in the absence of any available real-world data

    ASSESSMENTS OF CREDITWORTHINESS OF CRAFTS IN CROATIA

    Get PDF
    The main purpose of the paper is to analyse financial reporting practice in the craft sector and to assess the quality of the financial and qualitative data available. For this purpose, the credit scoring as methodology was used. Here, evaluation of the information produces model which is enabling the crafts’ creditworthiness assessment. In terms of empirical research, the importance of qualitative data is confirmed. Due to the fact that crafts in majority are oriented towards local markets, the model has potential to be applied on local level management in financial institutions and within the crafts’ suppliers. Overall, this promotes better inclusion of the crafts within their business environment
    corecore