83 research outputs found

    Mathematical programming models for classification problems with applications to credit scoring

    Get PDF
    Mathematical programming (MP) can be used for developing classification models for the two–group classification problem. An MP model can be used to generate a discriminant function that separates the observations in a training sample of known group membership into the specified groups optimally in terms of a group separation criterion. The simplest models for MP discriminant analysis are linear programming models in which the group separation measure is generally based on the deviations of misclassified observations from the discriminant function. MP discriminant analysis models have been tested extensively over the last 30 years in developing classifiers for the two–group classification problem. However, in the comparative studies that have included MP models for classifier development, the MP discriminant analysis models either lack appropriate normalisation constraints or they do not use the proper data transformation. In addition, these studies have generally been based on relatively small datasets. This thesis investigates the development of MP discriminant analysis models that incorporate appropriate normalisation constraints and data transformations. These MP models are tested on binary classification problems, with an emphasis on credit scoring problems, particularly application scoring, i.e. a two–group classification problem concerned with distinguishing between good and bad applicants for credit based on information from application forms and other relevant data. The performance of these MP models is compared with the performance of statistical techniques and machine learning methods and it is shown that MP discriminant analysis models can be useful tools for developing classifiers. Another topic covered in this thesis is feature selection. In order to make classification models easier to understand, it is desirable to develop parsimonious classification models with a limited number of features. Features should ideally be selected based on their impact on classification accuracy. Although MP discriminant analysis models can be extended for feature selection based on classification accuracy, there are computational difficulties in applying these models to large datasets. A new MP heuristic for selecting features is suggested based on a feature selection MP discriminant analysis model in which maximisation of classification accuracy is the objective. The results of the heuristic are promising in comparison with other feature selection methods. Classifiers should ideally be developed from datasets with approximately the same number of observations in each class, but in practice classifiers must often be developed from imbalanced datasets. New MP formulations are proposed to overcome the difficulties associated with generating discriminant functions from imbalanced datasets. These formulations are tested using datasets from financial institutions and the performance of the MP-generated classifiers is compared with classifiers generated by other methods. Finally, the ordinal classification problem is considered. MP methods for the ordinal classification problem are outlined and a new MP formulation is tested on a small dataset

    Recent methods from statistics and machine learning for credit scoring

    Get PDF
    Credit scoring models are the basis for financial institutions like retail and consumer credit banks. The purpose of the models is to evaluate the likelihood of credit applicants defaulting in order to decide whether to grant them credit. The area under the receiver operating characteristic (ROC) curve (AUC) is one of the most commonly used measures to evaluate predictive performance in credit scoring. The aim of this thesis is to benchmark different methods for building scoring models in order to maximize the AUC. While this measure is used to evaluate the predictive accuracy of the presented algorithms, the AUC is especially introduced as direct optimization criterion. The logistic regression model is the most widely used method for creating credit scorecards and classifying applicants into risk classes. Since this development process, based on the logit model, is standard in the retail banking practice, the predictive accuracy of this proceeding is used for benchmark reasons throughout this thesis. The AUC approach is a main task introduced within this work. Instead of using the maximum likelihood estimation, the AUC is considered as objective function to optimize it directly. The coefficients are estimated by calculating the AUC measure with Wilcoxon-Mann-Whitney and by using the Nelder-Mead algorithm for the optimization. The AUC optimization denotes a distribution-free approach, which is analyzed within a simulation study for investigating the theoretical considerations. It can be shown that the approach still works even if the underlying distribution is not logistic. In addition to the AUC approach and classical well-known methods like generalized additive models, new methods from statistics and machine learning are evaluated for the credit scoring case. Conditional inference trees, model-based recursive partitioning methods and random forests are presented as recursive partitioning algorithms. Boosting algorithms are also explored by additionally using the AUC as a loss function. The empirical evaluation is based on data from a German bank. From the application scoring, 26 attributes are included in the analysis. Besides the AUC, different performance measures are used for evaluating the predictive performance of scoring models. While classification trees cannot improve predictive accuracy for the current credit scoring case, the AUC approach and special boosting methods provide outperforming results compared to the robust classical scoring models regarding the predictive performance with the AUC measure.Scoringmodelle dienen Finanzinstituten als Grundlage dafür, die Ausfallwahrscheinlichkeit von Kreditantragstellern zu berechnen und zu entscheiden ob ein Kredit gewährt wird oder nicht. Das AUC (area under the receiver operating characteristic curve) ist eines der am häufigsten verwendeten Maße, um die Vorhersagekraft im Kreditscoring zu bewerten. Demzufolge besteht das Ziel dieser Arbeit darin, verschiedene Methoden zur Scoremodell-Bildung hinsichtlich eines optimierten AUC Maßes zu „benchmarken“. Während das genannte Maß dazu dient die vorgestellten Algorithmen hinsichtlich ihrer Trennschärfe zu bewerten, wird das AUC insbesondere als direktes Optimierungskriterium eingeführt. Die logistische Regression ist das am häufigsten verwendete Verfahren zur Entwicklung von Scorekarten und die Einteilung der Antragsteller in Risikoklassen. Da der Entwicklungsprozess mittels logistischer Regression im Retail-Bankenbereich stark etabliert ist, wird die Trennschärfe dieses Verfahrens in der vorliegenden Arbeit als Benchmark verwendet. Der AUC Ansatz wird als entscheidender Teil dieser Arbeit vorgestellt. Anstatt die Maximum Likelihood Schätzung zu verwenden, wird das AUC als direkte Zielfunktion zur Optimierung verwendet. Die Koeffizienten werden geschätzt, indem für die Berechnung des AUC die Wilcoxon Statistik und für die Optimierung der Nelder-Mead Algorithmus verwendet wird. Die AUC Optimierung stellt einen verteilungsfreien Ansatz dar, der im Rahmen einer Simulationsstudie untersucht wird, um die theoretischen Überlegungen zu analysieren. Es kann gezeigt werden, dass der Ansatz auch dann funktioniert, wenn in den Daten kein logistischer Zusammenhang vorliegt. Zusätzlich zum AUC Ansatz und bekannten Methoden wie Generalisierten Additiven Modellen, werden neue Methoden aus der Statistik und dem Machine Learning für das Kreditscoring evaluiert. Klassifikationsbäume, Modell-basierte Recursive Partitioning Methoden und Random Forests werden als Recursive Paritioning Methoden vorgestellt. Darüberhinaus werden Boosting Algorithmen untersucht, die auch das AUC Maß als Verlustfunktion verwenden. Die empirische Analyse basiert auf Daten einer deutschen Kreditbank. 26 Variablen werden im Rahmen der Analyse untersucht. Neben dem AUC Maß werden verschiedene Performancemaße verwendet, um die Trennschärfe von Scoringmodellen zu bewerten. Während Klassifikationsbäume im vorliegenden Kreditscoring Fall keine Verbesserungen erzielen, weisen der AUC Ansatz und einige Boosting Verfahren gute Ergebnisse im Vergleich zum robusten klassischen Scoringmodell hinsichtlich des AUC Maßes auf

    Supply of bank lending to small businesses

    Get PDF

    Mortgage credit scoring

    Get PDF

    Development of a Framework for Managing the Industry 4.0 Equipment Procurement Process for the Irish Life Sciences Sector

    Get PDF
    Industry 4.0 (I4.0) brings unprecedented opportunities for Manufacturing Corporations poised to implement Digital Business models; DigitALIZAtion. Industry Standards have been developed for the core technologies of the I4.0 Digital Supply Chains. Manufacturing equipment must now be procured to integrate seamlessly at any point in these novel supply chains. The aim of this study is to determine if an I4.0 Equipment Procurement Process (I4.0-EPP) can be developed which reduces the risk of equipment integration issues. It asks; Can the form of the equipment be specified, so that it correctly fits into the I4.0 Digital Supply Chain, to facilitate the desired I4.0 Digital Business function? An Agile Development Methodology was utilized to design the I4.0-EPP techniques and tools, for use by Technical and Business Users. Significant knowledge gaps were identified during User Acceptance Testing (UAT) by Technical Practitioners, over four equipment procurement case studies. Several iterations of UAT by MEng students, highlighted the requirement for Requirements Guides and specialized workbooks. These additional tools increased the understandability of the technical topics to an acceptable level and delivered very accurate results across a wide spectrum of users. This research demonstrates that techniques and tools can be developed for an I4.0-EPP which are accurate, feasible and viable, but, as with Six Sigma, will only become desirable, when mandated by Corporate Business Leaders. Future research should focus on implementing the ALIZA Matrix with Corporate Practitioners in the Business Domain. This approach will bring the ALIZA techniques and tools, developed during this study, to the attention of Corporate Business Leaders with the authority to sponsor them

    Essays on the Effects of Institutional Changes

    Get PDF
    Formal and informal institutions are important determinants of behavior and economic outcomes. In this dissertation, I study the causal effects of changes to one formal and one informal institution on individuals\u27 behavior. The first chapter uses an experiment in Bangladesh to show how providing information on delays in a public service provision to government bureaucrats and their supervisors affects these bureaucrats\u27 behavior and the outcomes for applicants for the public service. The second chapter (co-authored with Ro\u27ee Levy) shows how the MeToo movement increased the reporting of sexual crimes to the police by changing the norms, information, or both, about sexual misconduct. Chapter 1: “Service Delivery, Corruption, and Information Flows in Bureaucracies: Evidence from the Bangladesh Civil Service” Government bureaucracies in low- and middle-income countries often suffer from corruption and slow public service delivery. Can an information system – providing information about delays to the responsible bureaucrats and their supervisors – reduce delays? Paying bribes for faster service delivery is a common form of corruption, but does improving average processing times reduce bribes? To answer these questions, I conduct a large-scale field experiment over 16 months with the Bangladesh Civil Service. I send monthly scorecards measuring delays in service delivery to government officials and their supervisors. The scorecards increase services delivered on time by 11% but do not reduce bribes. Instead, the scorecards increase bribes for high-performing bureaucrats. These results are inconsistent with existing theories suggesting that speeding up service delivery reduces bribes. I propose a model where bureaucrats\u27 shame or reputational concerns constrain corruption. When bureaucrats\u27 reputation improves through positive performance feedback, this constraint is relaxed, and bribes increase. Overall, my study shows that improving information within bureaucracies can change bureaucrats\u27 behavior, even without explicit incentives. However, positive performance feedback can have negative spillovers on bureaucrats\u27 performance across different behaviors. Chapter 2: “The Effects of Social Movements: Evidence from #MeToo” (Joint with Ro\u27ee Levy) Social movements are associated with large societal changes, but evidence on their causal effects is limited. We study the effect of the MeToo movement on a high-stakes decision—reporting a sexual crime to the police. We construct a new dataset of sexual and non-sexual crimes reported in 30 OECD countries, covering 88% of the OECD population. We analyze the effect of the MeToo movement by employing a triple-difference strategy over time, across countries, and between crime types. The movement increased reporting of sexual crimes by 10% during its first six months. The effect is persistent and lasts at least 15 months. Because we find a strong effect on reporting before any major changes to laws or policy took place, we attribute the effect to a change in social norms or information. Using more detailed US data, we show that the movement also increased arrests for sexual crimes in the long run. In contrast to a common criticism of the movement, we do not find evidence for large differences in the effect across racial and socioeconomic groups. Our results suggest that social movements can rapidly change high-stakes personal decisions

    Censored regression techniques for credit scoring

    Get PDF
    This thesis investigates the use of newly-developed survival analysis tools for credit scoring. Credit scoring techniques are currently used by financial institutions to estimate the probability of a customer defaulting on a loan by a predetermined time in the future. While a number of classification techniques are currently used, banks are now becoming more concerned with estimating the lifetime of the loan rather than just the probability of default. Difficulties arise when using standard statistical techniques due to the presence of censoring in the data. Survival analysis, originating from medical and engineering fields, is an area of statistics that typically deals with censored lifetime data. The theoretical developments in this thesis revolve around linear regression for censored data, in particular the Buckley-James method. The Buckley-James method is analogous to linear regression and gives estimates of the mean expected lifetime given a set of explanatory variables. The first development is a measure of fit for censored regression, similar to the classical r-squared of linear regression. Next, the variable-reduction technique of stepwise selection is extended to the Buckley-James method. For the last development, the Buckley-James algorithm is altered to incorporate non-linear regression methods such as neural networks and Multivariate Adaptive Regression Splines (MARS). MARS shows promise in terms of predictive power and interpretability in both simulation and empirical studies. The practical section of the thesis involves using the new techniques to predict the time to default and time to repayment of unsecured personal loans from a database obtained from a major Australian bank. The analyses are unique, being the first published work on applying Buckley-James and related methods to a large-scale financial database
    corecore