36 research outputs found

    Managing credit risk and the cost of equity with machine learning techniques

    Get PDF
    Credit risks and the cost of equity can influence market participants' activities in many ways. Providing in-depth analysis can help participants reduce potential costs and make profitable strategies. This kind of study is usually armed with conventional statistical models built with researchers' knowledge. However, with the advancement of technology, a massive amount of financial data increasing in volume, subjectivity, and heterogeneity becomes challenging to process conventionally. Machine learning (ML) techniques have been utilised to handle this difficulty in real-life applications. This PhD thesis consists of three major empirical essays. We employ state-of-art machine learning techniques to predict peer-to-peer (P2P) lending default risk, P2P lending decisions, and Environmental, Social, Corporate Governance (ESG) effects on firms' cost of equity. In the era of financial technology, P2P lending has gained considerable attention among academics and market participants. In the first essay (Chapter 2), we investigate the determinants of P2P lending default prediction in relation to borrowers' characteristics and credit history. Applying machine learning techniques, we document substantial predictive ability compared with the benchmark logit model. Further, we find that the LightGBM has superior predictive power and outperforms all other models in all out-of-sample predictions. Finally, we offer insights into different levels of uncertainty in P2P loan groups and the value of machine learning in credit risk mitigation of P2P loan providers. Macroeconomic impact on funding decisions or lending standards reflects the risk-taking behaviour of market participants. It has been widely discussed by academics. But in the era of financial technology, it leaves a gap in the evidence of lending standards change in a FinTech nonbank financial organisation. The second essay (Chapter 3) aims to fill the gap by introducing loan-level and macroeconomic variables into the predictive models to estimate the P2P loan funding decision. Over 12 million empirical instances are under study while big data techniques, including text mining and five state-of-the-art approaches, are utilised. We note that macroeconomic condition affects individual risk-taking and reaching-for-yield behaviour. Finally, we offer insight into macroeconomic impact in terms of different levels of uncertainty in different P2P loan application groups. In the third essay (Chapter 4), we use up-to-date machine learning techniques to provide new evidence for the impact of ESG on the cost of equity. Using 15,229 firm-year observations from 51 different countries over the past 18 years, we document negative causal effects on the cost of equity. In addition, we uncover non-linear effects because the level of ESG effects on the equity cost decrease with the enhancements of ESG performance. Furthermore, we note the heterogeneity in ESG effects in different regions by breaking down our sample. Finally, we find that global crises change the sensitivity of the equity cost towards ESG, and the change varies in areas

    Internet Financial Credit Risk Assessment with Sliding Window and Attention Mechanism LSTM Model

    Get PDF
    With the accelerated pace of market-oriented reform, Internet finance has gained a broad and healthy development environment. Existing studies lack consideration of time trends in financial risk, and treating all features equally may lead to inaccurate predictions. To address the above problems, we propose an LSTM model based on sliding window and attention mechanism. The model uses sliding windows to enable the model to effectively exploit the contextual relevance of loan data. And we introduce the attention mechanism into the model, which enables the model to focus on important information. The result on the Lending Club public desensitization dataset shows that our model outperforms ARIMA, SVM, ANN, LSTM, and GRU models

    LightGBM: An Effective and Scalable Algorithm for Prediction of Chemical Toxicity – Application to the Tox21 and Mutagenicity Datasets

    Get PDF
    Machine learning algorithms have attained widespread use in assessing the potential toxicities of pharmaceuticals and industrial chemicals because of their faster-speed and lower-cost compared to experimental bioassays. Gradient boosting is an effective algorithm that often achieves high predictivity, but historically the relative long computational time limited its applications in predicting large compound libraries or developing in silico predictive models that require frequent retraining. LightGBM, a recent improvement of the gradient boosting algorithm inherited its high predictivity but resolved its scalability and long computational time by adopting leaf-wise tree growth strategy and introducing novel techniques. In this study, we compared the predictive performance and the computational time of LightGBM to deep neural networks, random forests, support vector machines, and XGBoost. All algorithms were rigorously evaluated on publicly available Tox21 and mutagenicity datasets using a Bayesian optimization integrated nested 10-fold cross-validation scheme that performs hyperparameter optimization while examining model generalizability and transferability to new data. The evaluation results demonstrated that LightGBM is an effective and highly scalable algorithm offering the best predictive performance while consuming significantly shorter computational time than the other investigated algorithms across all Tox21 and mutagenicity datasets. We recommend LightGBM for applications in in silico safety assessment and also in other areas of cheminformatics to fulfill the ever-growing demand for accurate and rapid prediction of various toxicity or activity related endpoints of large compound libraries present in the pharmaceutical and chemical industry

    Network centrality and credit risk:A comprehensive analysis of peer-to-peer lending dynamics

    Get PDF
    This letter analyzes credit risk assessment in the Peer-to-Peer (P2P) lending domain by leveraging a comprehensive dataset from Bondora, a leading European P2P platform. Through combining traditional credit features with network topological features, namely the degree centrality, we showcase the crucial role of a borrower's position and connectivity within the P2P network in determining loan default probabilities. Our findings are bolstered by robustness checks using shuffled centrality features, which further underscore the significance of integrating both financial and network attributes in credit risk evaluation. Our results shed new light on credit risk determinants in P2P lending and benefit investors in capturing inherent information from P2P loan networks.</p

    Ensemble Learning approach to Enhancing Binary Classification in Intrusion Detection System for Internet of Things

    Get PDF
    The Internet of Things (IoT) has experienced significant growth and plays a crucial role in daily activities. However, along with its development, IoT is very vulnerable to attacks and raises concerns for users. The Intrusion Detection System (IDS) operates efficiently to detect and identify suspicious activities within the network. The primary source of attacks originates from external sources, specifi-cally from the internet attempting to transmit data to the host network. IDS can identify unknown attacks from network traffic and has become one of the most effective network security. Classification is used to distinguish between normal class and attacks in binary classification problem. As a result, there is a rise in the false positive rates and a decrease in the detection accuracy during the model\u27s training. Based on the test results using the ensemble technique with the ensemble learning XGBoost and LightGBM algorithm, it can be concluded that both binary classification problems can be solved. The results using these ensemble learning algorithms on the ToN IoT Dataset, where binary classification has been performed by combining multiple devices into one, have demonstrated improved accuracy. Moreover, this ensemble approach ensures a more even distribution of accuracy across each device, surpassing the findings of previous research

    Tree-Based Approaches for Predicting Financial Performance

    Get PDF
    The lending industry commonly relied on assessing borrowers’ repayment performance to make lending decisions. This is to safeguard their assets and maintain their profitability. With the rise of Artificial Intelligence, lenders resorted to Machine Learning (ML) algorithms to solve this problem. In this study, the novelty introduced is applying ML’s Tree-based methods to a large dataset and accurately predicting financial repayment performance without using any repayment history, which was utilized in all literature reviewed. Instead, the attributes used were demographics and psychographics of applicants, only. The study’s proprietary US-based dataset comprises an anonymous population whose owner does not wish to be disclosed and it contains the information of about half a million beneficiaries with a very balanced bimodal binary target distribution. An Area Under the Curve of Receiver Characteristic Operator (ROC-AUC) of 85% was achieved with a binary classification target using CatBoost API. The study also experimented with a given tri-class target. Furthermore, this research used ML to gain insight into which attributes contribute the most to the repayment prediction. The study also tested whether similar results can be achieved with fewer attributes for the sake of the practicality of application by the data owner. The best model was applied to one of the biggest publicly available financial datasets for verification. The original research of said dataset had an accuracy score of 82%, this study achieved 79% using 5-fold Cross-Validation (CV). This result was achieved with Tree-Based models with a complexity of O(log n) compared to O(2n) in the original research, which is a significant efficiency enhancement

    Application of Big Data Technology, Text Classification, and Azure Machine Learning for Financial Risk Management Using Data Science Methodology

    Get PDF
    Data science plays a crucial role in enabling organizations to optimize data-driven opportunities within financial risk management. It involves identifying, assessing, and mitigating risks, ultimately safeguarding investments, reducing uncertainty, ensuring regulatory compliance, enhancing decision-making, and fostering long-term sustainability. This thesis explores three facets of Data Science projects: enhancing customer understanding, fraud prevention, and predictive analysis, with the goal of improving existing tools and enabling more informed decision-making. The first project examined leveraged big data technologies, such as Hadoop and Spark, to enhance financial risk management by accurately predicting loan defaulters and their repayment likelihood. In the second project, we investigated risk assessment and fraud prevention within the financial sector, where Natural Language Processing and machine learning techniques were applied to classify emails into categories like spam, ham, and phishing. After training various models, their performance was rigorously evaluated. In the third project, we explored the utilization of Azure machine learning to identify loan defaulters, emphasizing the comparison of different machine learning algorithms for predictive analysis. The results aimed to determine the best-performing model by evaluating various performance metrics for the dataset. This study is important because it offers a strategy for enhancing risk management, preventing fraud, and encouraging innovation in the financial industry, ultimately resulting in better financial outcomes and enhanced customer protection

    COMPARISON OF CLASSIFICATION ALGORITHM IN CLASSIFYING AIRLINE PASSENGER SATISFACTION

    Get PDF
    In order to revive the airline industry, which is being hit by the current recession, it is essential to restore passenger confidence in airlines by improving the services provided by airlines. With the influence of technology in all industrial fields, airlines can now use Machine Learning to find the essential points that can make passengers feel satisfied with airline services and classify passenger satisfaction. This study presents the making of Machine Learning models starting from Data Acquisition, Data Cleaning, Exploratory Data Analysis, Preprocessing, and Model Building. It is concluded that Random Forest is the best algorithm used in this case study, with an F1 accuracy score of 89.4, ROC-AUC score of 0.90, and a shorter modeling period than other algorithms used in this study

    Most Recent Malicious Software Datasets and Machine Learning Detection Techniques: A Review

    Get PDF
    مقدمة: في سياق الأمن السيبراني ، أصبح من الضروري مراقبة الأنظمة وتحليل البيانات للحفاظ على أمن البيانات وسلامتها. في الآونة الأخيرة ، أصبح من المهم إنشاء نظام لتحليل البيانات وتصنيفها ، بهدف منع أي برامج ضارة مثل البرامج الضارة. طرق العمل: تم استخدام أحدث مجموعة بيانات للبرامج الضارة وتقنيات التعلم الآلي الحديثة للكشف عن البرامج الضارة ، بناءً على اختيار الميزات الديناميكية. الاستنتاجات: أدت الزيادة المستمرة في عدد وأنواع الهجمات إلى توسع هائل في متغيرات عينات البرامج الضارة. لذلك ، يجب تصنيف البرامج الضارة إلى مجموعات وفقًا لسلوكها وتأثيرها وخصائصها. بالنظر إلى حقيقة أن البحث والتدريب عنصران أساسيان للأمن السيبراني ، فإن تغيير الطبيعة باستمرار يشكل تحديًا كبيرًا. تهدف هذه الدراسة بشكل أساسي إلى توضيح أحدث مجموعة بيانات للبرامج الضارة وتقنيات التعلم الآلي الحديثة للكشف عن البرامج الضارة ، بناءً على اختيار الميزات الديناميكيةBackground: Within the context of cyber security, it has become crucial to monitor systems and analyze data to maintain data security and integrity. Recently, it has become important to create a system for analyzing and classifying data, to prevent any malicious programs such as malware. Materials and Methods: The latest malware dataset and the latest machine-learning techniques were used to detect malware, based on dynamic feature identification. Results: The results showed that the FFNN algorithm was the best algorithm for the sorel20M dataset based on the research work discussed in this paper. &nbsp;Conclusion: The continuous increase in the number and types of attacks has led to a huge expansion in the variants of malware samples. Therefore, malware needs to be categorized into groups according to their behavior, influence, and characteristics. Given the fact that research and training are essential elements of cyber security, its constantly changing nature poses a great challenge. This study mainly aims to demonstrate the most recent malware dataset and modern machine-learning techniques of malware detection, based on dynamic feature selection
    corecore