247 research outputs found

    Investigating the Performance of Smote for Class Imbalanced Learning: A Case Study of Credit Scoring Datasets

    Get PDF
    Classification of datasets is one of the major issues encountered by the data mining community. This problem heightens when the real world datasets is also imbalanced in nature. A dataset happens to be imbalanced when the numbers of observations belonging to rare class are greatly outnumbered by the observations of another class. Class with greater number of observation is called the majority or the negative class, while the other with rare observations is referred to as the minority or the positive class. Literature represents number of resampling techniques that address the problem of class imbalance. One of the most important strategies is to resample the datasets that aim to balance the number of minority or majority observations by over-sampling or under-sampling respectively. This paper aims to investigates and analyze the performance of most widely used oversampling procedure Synthetic Minority Oversampling Technique (SMOTE) for different thresholds of oversampling using four classifiers for three credit scoring datasets

    Three-stage ensemble model : reinforce predictive capacity without compromising interpretability

    Get PDF
    Thesis proposal presented as partial requirement for obtaining the Master’s degree in Statistics and Information Management, with specialization in Risk Analysis and ManagementOver the last decade, several banks have developed models to quantify credit risk. In addition to the monitoring of the credit portfolio, these models also help deciding the acceptance of new contracts, assess customers profitability and define pricing strategy. The objective of this paper is to improve the approach in credit risk modeling, namely in scoring models to predict default events. To this end, we propose the development of a three-stage ensemble model that combines the results interpretability of the Scorecard with the predictive power of machine learning algorithms. The results show that ROC index improves 0.5%-0.7% and Accuracy 0%-1% considering the Scorecard as baseline

    Support Vector Machines for Credit Scoring and discovery of significant features

    Get PDF
    The assessment of risk of default on credit is important for financial institutions. Logistic regression and discriminant analysis are techniques traditionally used in credit scoring for determining likelihood to default based on consumer application and credit reference agency data. We test support vector machines against these traditional methods on a large credit card database. We find that they are competitive and can be used as the basis of a feature selection method to discover those features that are most significant in determining risk of default. 1

    A Non-Parametric-Based Computationally Efficient Approach for Credit Scoring

    Get PDF
    This research aimed at the case of credit scoring in risk management and presented the novel method for credit scoring to be used for default prediction. This study uses Kruskal-Wallis non-parametric statistic to form a computationally efficient credit-scoring model based on artificial neural network to study the impact on modelling performance. The findings show that new credit scoring methodology represents reasonable coefficient of determination and low false negative rate. It is computationally less expensive with high accuracy (AUC=0.99). Because of the recent respective of continued credit/behavior scoring, our study suggests to use this credit score for non-traditional data sources such as mobile phone data to study and reveal changes of client’s behavior during the time. This is the first study that develops a non-parametric credit scoring, which is able to reselect effective features for continued credit evaluation and weighted out by their level of contribution with a good diagnostic ability

    A non-parametric-based computationally efficient approach for credit scoring

    Get PDF
    Ashofteh, A., & Bravo, J. M. (2019). A non-parametric-based computationally efficient approach for credit scoring. In Atas da Conferencia da Associacao Portuguesa de Sistemas de Informacao 2019: 19ª Conferencia da Associacao Portuguesa de Sistemas de Informacao, CAPSI 2019 - 19th Conference of the Portuguese Association for Information Systems, CAPSI 2019; Lisboa; Portugal; 11 October 2019 through 12 October 2019 (pp. 19). (Atas da Conferencia da Associacao Portuguesa de Sistemas de Informacao).This research aimed at the case of credit scoring in risk management and presented the novel method for credit scoring to be used for default prediction. This study uses Kruskal-Wallis non-parametric statistic to form a computationally efficient credit-scoring model based on artificial neural network to study the impact on modelling performance. The findings show that new credit scoring methodology represents reasonable coefficient of determination and low false negative rate. It is computationally less expensive with high accuracy (AUC=0.99). Because of the recent respective of continued credit/behavior scoring, our study suggests to use this credit score for non-traditional data sources such as mobile phone data to study and reveal changes of client’s behavior during the time. This is the first study that develops a non-parametric credit scoring, which is able to reselect effective features for continued credit evaluation and weighted out by their level of contribution with a good diagnostic ability.publishersversionpublishe

    Examining the Trend of Literature on Classification Modelling: A Bibliometric Approach

    Get PDF
    This paper analyses and reports various types of published works related to classification or discriminant modelling. This paper adopted a bibliometric analysis based on the data obtained from the Scopus online database on 27th July 2019. Based on the ‘keywords’ search results, it yielded 2775 valid documents for further analysis. For data visualisation purposes, we employed VOSviewer. This paper reports the results using standard bibliometric indicators, particularly on the growth rate of publications, research productivity, analysis of the authors and citations. The outcomes revealed that there is an increased growth rate of classification literature over the years since 1968. A total of 2473 (89.12%) documents were from journals (n=1439; 51.86%) and conference proceedings (n=1034; 37.26%) contributed as the top publications in this classification topic. Meanwhile, 2578 (92.9%) documents are multi-authored with an average collaboration index of 3.34 authors per article. However, this classification research field found that the famous numbers of authors’ collaboration in a document are two (with n=758; 27.32%), three (n=752; 27.10%) and four (n=560; 20.18%) respectively. An analysis by country, China with 1146 (41.30%) published documents thus is ranked first in productivity. With respect to the frequency of citations, Bauer and Kohavi (1999)’s article emerged as the most cited article through 1414 total citations with an average of 70.7 citations per year. Overall, the increasing number of works on classification topics indicates a growing awareness of its importance and specific requirements in this research field
    • …
    corecore