8,715 research outputs found

    Improved credit scoring model using XGBoost with Bayesian hyper-parameter optimization

    Get PDF
    Several credit-scoring models have been developed using ensemble classifiers in order to improve the accuracy of assessment. However, among the ensemble models, little consideration has been focused on the hyper-parameters tuning of base learners, although these are crucial to constructing ensemble models. This study proposes an improved credit scoring model based on the extreme gradient boosting (XGB) classifier using Bayesian hyper-parameters optimization (XGB-BO). The model comprises two steps. Firstly, data pre-processing is utilized to handle missing values and scale the data. Secondly, Bayesian hyper-parameter optimization is applied to tune the hyper-parameters of the XGB classifier and used to train the model. The model is evaluated on four widely public datasets, i.e., the German, Australia, lending club, and Polish datasets. Several state-of-the-art classification algorithms are implemented for predictive comparison with the proposed method. The results of the proposed model showed promising results, with an improvement in accuracy of 4.10%, 3.03%, and 2.76% on the German, lending club, and Australian datasets, respectively. The proposed model outperformed commonly used techniques, e.g., decision tree, support vector machine, neural network, logistic regression, random forest, and bagging, according to the evaluation results. The experimental results confirmed that the XGB-BO model is suitable for assessing the creditworthiness of applicants

    Three-stage ensemble model : reinforce predictive capacity without compromising interpretability

    Get PDF
    Thesis proposal presented as partial requirement for obtaining the Master’s degree in Statistics and Information Management, with specialization in Risk Analysis and ManagementOver the last decade, several banks have developed models to quantify credit risk. In addition to the monitoring of the credit portfolio, these models also help deciding the acceptance of new contracts, assess customers profitability and define pricing strategy. The objective of this paper is to improve the approach in credit risk modeling, namely in scoring models to predict default events. To this end, we propose the development of a three-stage ensemble model that combines the results interpretability of the Scorecard with the predictive power of machine learning algorithms. The results show that ROC index improves 0.5%-0.7% and Accuracy 0%-1% considering the Scorecard as baseline

    A Hybrid Technological Innovation Text Mining, Ensemble Learning and Risk Scorecard Approach for Enterprise Credit Risk Assessment

    Get PDF
    Enterprise credit risk assessment models typically use financial-based information as a predictor variable, relying on backward-looking historical information rather than forward-looking information for risk assessment. We propose a novel hybrid assessment of credit risk that uses technological innovation information as a predictor variable. Text mining techniques are used to extract this information for each enterprise. A combination of random forest and extreme gradient boosting are used for indicator screening, and finally, risk scorecard based on logistic regression is used for credit risk scoring. Our results show that technological innovation indicators obtained through text mining provide valuable information for credit risk assessment, and that the combination of ensemble learning from random forest and extreme gradient boosting combinations with logistic regression models outperforms other traditional methods. The best results achieved 0.9129 area under receiver operating characteristic. In addition, our approach provides meaningful scoring rules for credit risk assessment of technology innovation enterprises

    Bankruptcy prediction model using cost-sensitive extreme gradient boosting in the context of imbalanced datasets

    Get PDF
    In the process of bankruptcy prediction models, a class imbalanced problem has occurred which limits the performance of the models. Most prior research addressed the problem by applying resampling methods such as the synthetic minority oversampling technique (SMOTE). However, resampling methods lead to other issues, e.g., increasing noisy data and training time during the process. To improve the bankruptcy prediction model, we propose cost-sensitive extreme gradient boosting (CS-XGB) to address the class imbalanced problem without requiring any resampling method. The proposed method’s effectiveness is evaluated on six real-world datasets, i.e., the LendingClub, and five Polish companies’ bankruptcy. This research compares the performance of CS-XGB with other ensemble methods, including SMOTE-XGB which applies SMOTE to the training set before the learning process. The experimental results show that i) based on LendingClub, the CS-XGB improves the performance of XGBoost and SMOTE-XGB by more than 50% and 33% on bankruptcy detection rate (BDR) and geometric mean (GM), respectively, and ii) the CS-XGB model outperforms random forest (RF), Bagging, AdaBoost, XGBoost, and SMOTE-XGB in terms of BDR, GM, and the area under a receiver operating characteristic curve (AUC) based on the five Polish datasets. Besides, the CS-XGB model achieves good overall prediction results

    Autoencoders for strategic decision support

    Full text link
    In the majority of executive domains, a notion of normality is involved in most strategic decisions. However, few data-driven tools that support strategic decision-making are available. We introduce and extend the use of autoencoders to provide strategically relevant granular feedback. A first experiment indicates that experts are inconsistent in their decision making, highlighting the need for strategic decision support. Furthermore, using two large industry-provided human resources datasets, the proposed solution is evaluated in terms of ranking accuracy, synergy with human experts, and dimension-level feedback. This three-point scheme is validated using (a) synthetic data, (b) the perspective of data quality, (c) blind expert validation, and (d) transparent expert evaluation. Our study confirms several principal weaknesses of human decision-making and stresses the importance of synergy between a model and humans. Moreover, unsupervised learning and in particular the autoencoder are shown to be valuable tools for strategic decision-making
    • …
    corecore