2,920 research outputs found

    Advanced survival modelling for consumer credit risk assessment: addressing recurrent events, multiple outcomes and frailty

    Get PDF
    A thesis submitted in partial fulfillment of the requirements for the degree of Doctor in Information Management, specialization in Statistics and EconometricsThis thesis worked on the application of advanced survival models in consumer credit risk assessment, particularly to address issues of recurrent delinquency (or default) and recovery (cure) events as well as multiple risk events and frailty. Each chapter (2 to 5) addressed a separate problem and several key conclusions were reached. Chapter 2 addressed the neglected area of modelling recovery from delinquency to normal performance on retail consumer loans taking into account the recurrent nature of delinquency and also including time-dependent macroeconomic variables. Using data from a lending company in Zimbabwe, we provided a comprehensive analysis of the recovery patterns using the extended Cox model. The findings vividly showed that behavioural variables were the most important in understanding recovery patterns of obligors. This confirms and underscores the importance of using behavioural models to understand the recovery patterns of obligors in order to prevent credit loss. The findings also strongly revealed that the falling real gross domestic product, representing a deteriorating economic situation significantly explained the diminishing rate of recovery from delinquency to normal performance among consumers. The study pointed to the urgent need for policy measures aimed at promoting economic growth for the stabilisation of consumer welfare and the financial system at large.Chapter 3 extends the work in chapter 2 and notes that, even though multiple failure-time data are ubiquitous in finance and economics especially in the credit risk domain, it is unfortunate that naive statistical techniques which ignore the subsequent events are commonly used to analyse such data. Applying standard statistical methods without addressing the recurrence of the events produces biased and inefficient estimates, thus offering erroneous predictions. We explore various ways of modelling and forecasting recurrent delinquency and recovery events on consumer loans. Using consumer loans data from a severely distressed economic environment, we illustrate and empirically compare extended Cox models for ordered recurrent recovery events. We highlight that accounting for multiple events proffers detailed information, thus providing a nuanced understanding of the recovery prognosis of delinquents. For ordered indistinguishable recurrent recovery events, we recommend using the Andersen and Gill (1982) model since it fits these assumptions and performs well on predicting recovery.Chapter 4 extends chapters 2 and 3 and highlight that rigorous credit risk analysis is not only of significance to lenders and banks but is also of paramount importance for sound regulatory and economic policy making. Increasing loan impairment or delinquency, defaults and mortgage foreclosures signals a sick economy and generates considerable financial stability concerns. For lenders and banks, the accurate estimation of credit risk parameters remains essential for pricing, profit testing, capital provisioning as well as for managing delinquents. Traditional credit scoring models such as the logit regression only provide estimates of the lifetime probability of default for a loan but cannot identify the existence of cures and or other movements. These methods lack the ability to characterise the progression of borrowers over time and cannot utilise all the available data to understand the recurrence of risk events and possible occurrence of multiple loan outcomes. In this paper, we propose a system-wide multi-state framework to jointly model state occupations and the transitions between normal performance (current), delinquency, prepayment, repurchase, short sale and foreclosure on mortgage loans. The probability of loans transitioning to and from the various states is estimated in a discrete-time multi-state Markov model with seven allowable states and sixteen possible transitions. Additionally, we investigate the relationship between the probability of loans transitioning to and from various loan outcomes and loan-level covariates. We empirically test the performance of the model using the US single-family mortgage loans originated during the first quarter of 2009 and were followed on their monthly repayment performance until the third quarter of 2016. Our results show that the main factors affecting the transition into various loan outcomes are affordability as measured by debt-to-income ratio, equity as marked by loan-to-value ratio, interest rates and the property type. In chapter 5, we note that there has been increasing availability of consumer credit in Zimbabwe, yet the credit information sharing systems are not as advanced. Using frailty survival models on credit bureau data from Zimbabwe, the study investigates the possible underestimation of credit losses under the assumption of independence of default event times. The study found that adding a frailty term significantly improved the models, thus indicating the presence of unobserved heterogeneity. The major policy recommendation is for the regulator to institute appropriate policy frameworks to allow robust and complete credit information sharing and reporting as doing so will significantly improve the functioning of the credit market

    A Framework for Credit Risk Prediction Using the Optimized-FKSVR Machine Learning Classifier

    Get PDF
    Transparency is influenced by several crucial factors, such as credit risk (CR) predictions, model reliability, efficient loan processing, etc. The emergence of machine learning (ML) techniques provides a promising solution to address these challenges. However, it is the responsibility of banking or nonbanking organizations to control their approach to incorporate this innovative methodology to mitigate human preferences in loan decision-making. The research article presents the Optimized-Feature based Kernel Support Vector Regression (O-FKSVR) model which is an ML-based CR analysis model in the digital banking. This proposal aims to compare several ML methods to identify a precise model for CR assessment using real credit database information. The goal is to introduce a classification model that uses a hybrid of Stochastic Gradient Descent (SGD) and firefly optimization (FFO) methods with Support Vector Regression (SVR) to predict credit risks in the form of probability, loss given, and exposure at defaults. The proposed  O-FKSVR model extracts features and predicts outcomes based on data gathered from online credit analysis. The proposed O-FKSVR model has increased the accuracy rate and resolved the existing problems. The experimental study is conducted in Python, and the results demonstrate improvements in accuracy, precision, and reduced error rates compared to previous ML methods. The proposed O-FKSVR model has achieved a maximum accuracy rate value of 0.955%, precision value of 0.96%, and recall value of 0.952%, error rate value of 4.4 when compared with the existing models such as SVR, DT, RF, and AdaBoost.&nbsp

    IMPECCABLE: Integrated Modeling PipelinE for COVID Cure by Assessing Better LEads

    Get PDF
    The drug discovery process currently employed in the pharmaceutical industry typically requires about 10 years and $2–3 billion to deliver one new drug. This is both too expensive and too slow, especially in emergencies like the COVID-19 pandemic. In silico methodologies need to be improved both to select better lead compounds, so as to improve the efficiency of later stages in the drug discovery protocol, and to identify those lead compounds more quickly. No known methodological approach can deliver this combination of higher quality and speed. Here, we describe an Integrated Modeling PipEline for COVID Cure by Assessing Better LEads (IMPECCABLE) that employs multiple methodological innovations to overcome this fundamental limitation. We also describe the computational framework that we have developed to support these innovations at scale, and characterize the performance of this framework in terms of throughput, peak performance, and scientific results. We show that individual workflow components deliver 100 × to 1000 × improvement over traditional methods, and that the integration of methods, supported by scalable infrastructure, speeds up drug discovery by orders of magnitudes. IMPECCABLE has screened ∼ 1011 ligands and has been used to discover a promising drug candidate. These capabilities have been used by the US DOE National Virtual Biotechnology Laboratory and the EU Centre of Excellence in Computational Biomedicine

    On the Reliability of Machine Learning Models for Survival Analysis When Cure Is a Possibility

    Get PDF
    [Abstract]: In classical survival analysis, it is assumed that all the individuals will experience the event of interest. However, if there is a proportion of subjects who will never experience the event, then a standard survival approach is not appropriate, and cure models should be considered instead. This paper deals with the problem of adapting a machine learning approach for classical survival analysis to a situation when cure (i.e., not suffering the event) is a possibility. Specifically, a brief review of cure models and recent machine learning methodologies is presented, and an adaptation of machine learning approaches to account for cured individuals is introduced. In order to validate the proposed methods, we present an extensive simulation study in which we compare the performance of the adapted machine learning algorithms with existing cure models. The results show the good behavior of the semiparametric or the nonparametric approaches, depending on the simulated scenario. The practical utility of the methodology is showcased through two real-world dataset illustrations. In the first one, the results show the gain of using the nonparametric mixture cure model approach. In the second example, the results show the poor performance of some machine learning methods for small sample sizes.This project was funded by the Xunta de Galicia (Axencia Galega de Innovación) Research projects COVID-19 presented in ISCIII IN845D 2020/26, Operational Program FEDER Galicia 2014–2020; by the Centro de Investigación de Galicia “CITIC”, funded by Xunta de Galicia and the European Union European Regional Development Fund (ERDF)-Galicia 2014–2020 Program, by grant ED431G 2019/01; and by the Spanish Ministerio de Economía y Competitividad (research projects PID2019-109238GB-C22 and PID2021-128045OA-I00). ALC was sponsored by the BEATRIZ GALINDO JUNIOR Spanish Grant from MICINN (Ministerio de Ciencia e Innovación) with code BGP18/00154. ALC was partially supported by the MICINN Grant PID2020-113578RB-I00 and partial support of Xunta de Galicia (Grupos de Referencia Competitiva ED431C-2020-14). We gratefully acknowledge the support of NVIDIA Corporation with the donation of the Titan Xp GPU used for this research.Xunta de Galicia; ED431G 2019/01Xunta de Galicia; ED431C-2020-14Xunta de Galicia; IN845D 2020/2

    Conformal Prediction: a Unified Review of Theory and New Challenges

    Full text link
    In this work we provide a review of basic ideas and novel developments about Conformal Prediction -- an innovative distribution-free, non-parametric forecasting method, based on minimal assumptions -- that is able to yield in a very straightforward way predictions sets that are valid in a statistical sense also in in the finite sample case. The in-depth discussion provided in the paper covers the theoretical underpinnings of Conformal Prediction, and then proceeds to list the more advanced developments and adaptations of the original idea.Comment: arXiv admin note: text overlap with arXiv:0706.3188, arXiv:1604.04173, arXiv:1709.06233, arXiv:1203.5422 by other author

    Untangling hotel industry’s inefficiency: An SFA approach applied to a renowned Portuguese hotel chain

    Get PDF
    The present paper explores the technical efficiency of four hotels from Teixeira Duarte Group - a renowned Portuguese hotel chain. An efficiency ranking is established from these four hotel units located in Portugal using Stochastic Frontier Analysis. This methodology allows to discriminate between measurement error and systematic inefficiencies in the estimation process enabling to investigate the main inefficiency causes. Several suggestions concerning efficiency improvement are undertaken for each hotel studied.info:eu-repo/semantics/publishedVersio

    Feature selection and personalized modeling on medical adverse outcome prediction

    Get PDF
    This thesis is about the medical adverse outcome prediction and is composed of three parts, i.e. feature selection, time-to-event prediction and personalized modeling. For feature selection, we proposed a three-stage feature selection method which is an ensemble of filter, embedded and wrapper selection techniques. We combine them in a way to select a both stable and predictive set of features as well as reduce the computation burden. Datasets on two adverse outcome prediction problems, 30-day hip fracture readmission and diabetic retinopathy prognosis are derived from electronic health records and exemplified to prove the effectiveness of the proposed method. With the selected features, we investigated the application of some classical survival analysis models, namely the accelerated failure time models, Cox proportional hazard regression models and mixture cure models on adverse outcome prediction. Unlike binary classifiers, survival analysis methods consider both the status and time-to-event information and provide more flexibility when we are interested in the occurrence of adverse outcome in different time windows. Lastly, we introduced the use of personalized modeling(PM) to predict adverse outcome based on the most similar patients of each query patient. Different from the commonly used global modeling approach, PM builds prediction model on smaller but more similar patient cohort thus leading to a more individual-based prediction and customized risk factor profile. Both static and metric learning distance measures are used to identify similar patient cohort. We show that PM together with feature selection achieves better prediction performance by using only similar patients, compared with using data from all available patients in one-size-fits-all model

    Transformation of the Forecast Assessment of Expected Credit Losses in Monitoring and Assessment of Credit Risk in Commercial Banks

    Get PDF
    The article presents the results of the systematization of issues arising in connection with the transformation of the banks forecast assessment of expected credit losses during the monitoring and evaluation of credit risk in commercial banks. Based on the data obtained on the introduction of IFRS 9 "Financial instruments" into the banking sector, it is concluded that in banking practice there is uncertainty regarding the long-term impact of credit risk, and there are significant difficulties with the use of a large amount of additional information, which creates certain difficulties in calculating future credit losses of banks. It is noted that the current use of the model of predictive assessment of expected credit losses of customers in the monitoring and evaluation of credit risk in the bank should take into account the selected collective or individual basis of assessment. The article presents a comprehensive approach to the use of the impairment model of expected losses in banking as a basic tool for modeling expected credit losses in order to form provisions for impairment with the allocation. The modification of this model will depend on the specifics of the bank's credit activities and portfolio, the types of its financial instruments, the sources of available information, as well as the IT systems used. Validation of this model will reduce the expected credit losses, reduce the amount of estimated reserves, as well as improve the efficiency of the Bank as a whole
    corecore