Search CORE

4,727 research outputs found

Support Vector Machines for Credit Scoring and discovery of significant features

Author: Baesens
Cristianini
Duda
Gayler
Guyon
Hand
Hand
Henley
Huang
Huang
Joachims
Jonathan Crook
Lee
Li
Schebesch
Thomas
Tony Bellotti
Van Gestel
Vapnik
Publication venue: 'Elsevier BV'
Publication date: 02/04/2008
Field of study

The assessment of risk of default on credit is important for financial institutions. Logistic regression and discriminant analysis are techniques traditionally used in credit scoring for determining likelihood to default based on consumer application and credit reference agency data. We test support vector machines against these traditional methods on a large credit card database. We find that they are competitive and can be used as the basis of a feature selection method to discover those features that are most significant in determining risk of default. 1

CiteSeerX

Crossref

Edinburgh Research Explorer

Application of support vector machines on the basis of the first Hungarian bankruptcy model

Author: Boyacioglu M. A.
Burges C. J. C.
Ding Y.
Hearst M. A.
Huang Z.
Kim H. K.
Kristóf T.
Lee M. C.
Lensberg T. L.
Miklós Virag
Min J. H.
Moradi M.
Shin K. S.
Sun L.
Szûcs I.
Tamás Nyitrai
Vapnik V. M.
Vapnik V. M.
Virág M.
Virág M.
Virág M.
Yang Y.
Publication venue: 'Akademiai Kiado Zrt.'
Publication date: 01/06/2013
Field of study

In our study we rely on a data mining procedure known as support vector machine (SVM) on the database of the first Hungarian bankruptcy model. The models constructed are then contrasted with the results of earlier bankruptcy models with the use of classification accuracy and the area under the ROC curve. In using the SVM technique, in addition to conventional kernel functions, we also examine the possibilities of applying the ANOVA kernel function and take a detailed look at data preparation tasks recommended in using the SVM method (handling of outliers). The results of the models assembled suggest that a significant improvement of classification accuracy can be achieved on the database of the first Hungarian bankruptcy model when using the SVM method as opposed to neural networks

Corvinus Research Archive

Crossref

Credit scoring: comparison of non‐parametric techniques against logistic regression

Author: Amaro Miguel Mendes
Publication venue
Publication date: 01/06/2020
Field of study

Dissertation presented as the partial requirement for obtaining a Master's degree in Information Management, specialization in Knowledge Management and Business IntelligenceOver the past decades, financial institutions have been giving increased importance to credit risk management as a critical tool to control their profitability. More than ever, it became crucial for these institutions to be able to well discriminate between good and bad clients for only accepting the credit applications that are not likely to default. To calculate the probability of default of a particular client, most financial institutions have credit scoring models based on parametric techniques. Logistic regression is the current industry standard technique in credit scoring models, and it is one of the techniques under study in this dissertation. Although it is regarded as a robust and intuitive technique, it is still not free from several critics towards the model assumptions it takes that can compromise its predictions. This dissertation intends to evaluate the gains in performance resulting from using more modern non-parametric techniques instead of logistic regression, performing a model comparison over four different real-life credit datasets. Specifically, the techniques compared against logistic regression in this study consist of two single classifiers (decision tree and SVM with RBF kernel) and two ensemble methods (random forest and stacking with cross-validation). The literature review demonstrates that heterogeneous ensemble approaches have a weaker presence in credit scoring studies and, because of that, stacking with cross-validation was considered in this study. The results demonstrate that logistic regression outperforms the decision tree classifier, has similar performance in relation to SVM and slightly underperforms both ensemble approaches in similar extents

Repositório da Universidade Nova de Lisboa

Penalizing Unfairness in Binary Classification

Author: Bechavod Yahav
Ligett Katrina
Publication venue
Publication date: 30/06/2017
Field of study

We present a new approach for mitigating unfairness in learned classifiers. In particular, we focus on binary classification tasks over individuals from two populations, where, as our criterion for fairness, we wish to achieve similar false positive rates in both populations, and similar false negative rates in both populations. As a proof of concept, we implement our approach and empirically evaluate its ability to achieve both fairness and accuracy, using datasets from the fields of criminal risk assessment, credit, lending, and college admissions

arXiv.org e-Print Archive

Caltech Authors

Histogram-based models on non-thin section chest CT predict invasiveness of primary lung adenocarcinoma subsolid nodules.

Author: Dmytriw Adam A
Hwang David M
Nguyen Elsie T
Oikonomou Anastasia
Paul Narinder S
Petersen Alexander
Salazar Pascal
Zhang Yuchen
Publication venue: eScholarship, University of California
Publication date: 01/04/2019
Field of study

109 pathologically proven subsolid nodules (SSN) were segmented by 2 readers on non-thin section chest CT with a lung nodule analysis software followed by extraction of CT attenuation histogram and geometric features. Functional data analysis of histograms provided data driven features (FPC1,2,3) used in further model building. Nodules were classified as pre-invasive (P1, atypical adenomatous hyperplasia and adenocarcinoma in situ), minimally invasive (P2) and invasive adenocarcinomas (P3). P1 and P2 were grouped together (T1) versus P3 (T2). Various combinations of features were compared in predictive models for binary nodule classification (T1/T2), using multiple logistic regression and non-linear classifiers. Area under ROC curve (AUC) was used as diagnostic performance criteria. Inter-reader variability was assessed using Cohen's Kappa and intra-class coefficient (ICC). Three models predicting invasiveness of SSN were selected based on AUC. First model included 87.5 percentile of CT lesion attenuation (Q.875), interquartile range (IQR), volume and maximum/minimum diameter ratio (AUC:0.89, 95%CI:[0.75 1]). Second model included FPC1, volume and diameter ratio (AUC:0.91, 95%CI:[0.77 1]). Third model included FPC1, FPC2 and volume (AUC:0.89, 95%CI:[0.73 1]). Inter-reader variability was excellent (Kappa:0.95, ICC:0.98). Parsimonious models using histogram and geometric features differentiated invasive from minimally invasive/pre-invasive SSN with good predictive performance in non-thin section CT

Directory of Open Access Journals

eScholarship - University of California