6 research outputs found

    Default Prediction of Internet Finance Users Based on Imbalance-XGBoost

    Get PDF
    Fast and accurate identification of financial fraud is a challenge in Internet finance. Based on the characteristics of imbalanced distribution of Internet financial data, this paper integrates machine learning methods and Internet financial data to propose a prediction model for loan defaults, and proves its effectiveness and generalizability through empirical research. In this paper, we introduce a processing method (link processing method) for imbalance data based on the traditional early warning model. In this paper, we conduct experiments using the financial dataset of Lending Club platform and prove that our model is superior to XGBoost, NGBoost, Ada Boost, and GBDT in the prediction of default risk

    AUTOENCODER BASED GENERATOR FOR CREDIT INFORMATION RECOVERY OF RURAL BANKS

    Get PDF
    By using machine learning algorithms, banks and other lending institutions can construct intelligent risk control models for loan businesses, which helps to overcome the disadvantages of traditional evaluation methods, such as low efficiency and excessive reliance on the subjective judgment of auditors. However, in the practical evaluation process, it is inevitable to encounter data with missing credit characteristics. Therefore, filling in the missing characteristics is crucial for the training process of those machine learning algorithms, especially when applied to rural banks with little credit data. In this work, we proposed an autoencoder-based algorithm that can use the correlation between data to restore the missing data items in the features. Also, we selected several open-source datasets (German Credit Data, Give Me Some Credit on the Kaggle platform, etc.) as the training and test dataset to verify the algorithm. The comparison results show that our model outperforms the others, although the performance of the autoencoder-based feature restorer decreases significantly when the feature missing ratio exceeds 70%

    Managing credit risk and the cost of equity with machine learning techniques

    Get PDF
    Credit risks and the cost of equity can influence market participants' activities in many ways. Providing in-depth analysis can help participants reduce potential costs and make profitable strategies. This kind of study is usually armed with conventional statistical models built with researchers' knowledge. However, with the advancement of technology, a massive amount of financial data increasing in volume, subjectivity, and heterogeneity becomes challenging to process conventionally. Machine learning (ML) techniques have been utilised to handle this difficulty in real-life applications. This PhD thesis consists of three major empirical essays. We employ state-of-art machine learning techniques to predict peer-to-peer (P2P) lending default risk, P2P lending decisions, and Environmental, Social, Corporate Governance (ESG) effects on firms' cost of equity. In the era of financial technology, P2P lending has gained considerable attention among academics and market participants. In the first essay (Chapter 2), we investigate the determinants of P2P lending default prediction in relation to borrowers' characteristics and credit history. Applying machine learning techniques, we document substantial predictive ability compared with the benchmark logit model. Further, we find that the LightGBM has superior predictive power and outperforms all other models in all out-of-sample predictions. Finally, we offer insights into different levels of uncertainty in P2P loan groups and the value of machine learning in credit risk mitigation of P2P loan providers. Macroeconomic impact on funding decisions or lending standards reflects the risk-taking behaviour of market participants. It has been widely discussed by academics. But in the era of financial technology, it leaves a gap in the evidence of lending standards change in a FinTech nonbank financial organisation. The second essay (Chapter 3) aims to fill the gap by introducing loan-level and macroeconomic variables into the predictive models to estimate the P2P loan funding decision. Over 12 million empirical instances are under study while big data techniques, including text mining and five state-of-the-art approaches, are utilised. We note that macroeconomic condition affects individual risk-taking and reaching-for-yield behaviour. Finally, we offer insight into macroeconomic impact in terms of different levels of uncertainty in different P2P loan application groups. In the third essay (Chapter 4), we use up-to-date machine learning techniques to provide new evidence for the impact of ESG on the cost of equity. Using 15,229 firm-year observations from 51 different countries over the past 18 years, we document negative causal effects on the cost of equity. In addition, we uncover non-linear effects because the level of ESG effects on the equity cost decrease with the enhancements of ESG performance. Furthermore, we note the heterogeneity in ESG effects in different regions by breaking down our sample. Finally, we find that global crises change the sensitivity of the equity cost towards ESG, and the change varies in areas

    Credit Risk Scoring: A Stacking Generalization Approach

    Get PDF
    Dissertation presented as the partial requirement for obtaining a Master's degree in Statistics and Information Management, specialization in Risk Analysis and ManagementCredit risk regulation has been receiving tremendous attention, as a result of the effects of the latest global financial crisis. According to the developments made in the Internal Rating Based approach, under the Basel guidelines, banks are allowed to use internal risk measures as key drivers to assess the possibility to grant a loan to an applicant. Credit scoring is a statistical approach used for evaluating potential loan applications in both financial and banking institutions. When applying for a loan, an applicant must fill out an application form detailing its characteristics (e.g., income, marital status, and loan purpose) that will serve as contributions to a credit scoring model which produces a score that is used to determine whether a loan should be granted or not. This enables faster and consistent credit approvals and the reduction of bad debt. Currently, many machine learning and statistical approaches such as logistic regression and tree-based algorithms have been used individually for credit scoring models. Newer contemporary machine learning techniques can outperform classic methods by simply combining models. This dissertation intends to be an empirical study on a publicly available bank loan dataset to study banking loan default, using ensemble-based techniques to increase model robustness and predictive power. The proposed ensemble method is based on stacking generalization an extension of various preceding studies that used different techniques to further enhance the model predictive capabilities. The results show that combining different models provides a great deal of flexibility to credit scoring models
    corecore