6,011 research outputs found

    An academic review: applications of data mining techniques in finance industry

    Get PDF
    With the development of Internet techniques, data volumes are doubling every two years, faster than predicted by Moore’s Law. Big Data Analytics becomes particularly important for enterprise business. Modern computational technologies will provide effective tools to help understand hugely accumulated data and leverage this information to get insights into the finance industry. In order to get actionable insights into the business, data has become most valuable asset of financial organisations, as there are no physical products in finance industry to manufacture. This is where data mining techniques come to their rescue by allowing access to the right information at the right time. These techniques are used by the finance industry in various areas such as fraud detection, intelligent forecasting, credit rating, loan management, customer profiling, money laundering, marketing and prediction of price movements to name a few. This work aims to survey the research on data mining techniques applied to the finance industry from 2010 to 2015.The review finds that Stock prediction and Credit rating have received most attention of researchers, compared to Loan prediction, Money Laundering and Time Series prediction. Due to the dynamics, uncertainty and variety of data, nonlinear mapping techniques have been deeply studied than linear techniques. Also it has been proved that hybrid methods are more accurate in prediction, closely followed by Neural Network technique. This survey could provide a clue of applications of data mining techniques for finance industry, and a summary of methodologies for researchers in this area. Especially, it could provide a good vision of Data Mining Techniques in computational finance for beginners who want to work in the field of computational finance

    Machine Learning applied to credit risk assessment: Prediction of loan defaults

    Get PDF
    Dissertation presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced Analytics, specialization in Data ScienceDue to the recent financial crisis and regulatory concerns of Basel II, credit risk assessment is becoming a very important topic in the field of financial risk management. Financial institutions need to take great care when dealing with consumer loans in order to avoid losses and costs of opportunity. For this matter, credit scoring systems have been used to make informed decisions on whether or not to grant credit to clients who apply to them. Until now several credit scoring models have been proposed, from statistical models, to more complex artificial intelligence techniques. However, most of previous work is focused on employing single classifiers. Ensemble learning is a powerful machine learning paradigm which has proven to be of great value in solving a variety of problems. This study compares the performance of the industry standard, logistic regression, to four ensemble methods, i.e. AdaBoost, Gradient Boosting, Random Forest and Stacking in identifying potential loan defaults. All the models were built with a real world dataset with over one million customers from Lending Club, a financial institution based in the United States. The performance of the models was compared by using the Hold-out method as the evaluation design and accuracy, AUC, type I error and type II error as evaluation metrics. Experimental results reveal that the ensemble classifiers were able to outperform logistic regression on three key indicators, i.e. accuracy, type I error and type II error. AdaBoost performed better than the remaining classifiers considering a trade off between all the metrics evaluated. The main contribution of this thesis is an experimental addition to the literature on the preferred models for predicting potential loan defaulters

    Predictive Modelling of Retail Banking Transactions for Credit Scoring, Cross-Selling and Payment Pattern Discovery

    Get PDF
    Evaluating transactional payment behaviour offers a competitive advantage in the modern payment ecosystem, not only for confirming the presence of good credit applicants or unlocking the cross-selling potential between the respective product and service portfolios of financial institutions, but also to rule out bad credit applicants precisely in transactional payments streams. In a diagnostic test for analysing the payment behaviour, I have used a hybrid approach comprising a combination of supervised and unsupervised learning algorithms to discover behavioural patterns. Supervised learning algorithms can compute a range of credit scores and cross-sell candidates, although the applied methods only discover limited behavioural patterns across the payment streams. Moreover, the performance of the applied supervised learning algorithms varies across the different data models and their optimisation is inversely related to the pre-processed dataset. Subsequently, the research experiments conducted suggest that the Two-Class Decision Forest is an effective algorithm to determine both the cross-sell candidates and creditworthiness of their customers. In addition, a deep-learning model using neural network has been considered with a meaningful interpretation of future payment behaviour through categorised payment transactions, in particular by providing additional deep insights through graph-based visualisations. However, the research shows that unsupervised learning algorithms play a central role in evaluating the transactional payment behaviour of customers to discover associations using market basket analysis based on previous payment transactions, finding the frequent transactions categories, and developing interesting rules when each transaction category is performed on the same payment stream. Current research also reveals that the transactional payment behaviour analysis is multifaceted in the financial industry for assessing the diagnostic ability of promotion candidates and classifying bad credit applicants from among the entire customer base. The developed predictive models can also be commonly used to estimate the credit risk of any credit applicant based on his/her transactional payment behaviour profile, combined with deep insights from the categorised payment transactions analysis. The research study provides a full review of the performance characteristic results from different developed data models. Thus, the demonstrated data science approach is a possible proof of how machine learning models can be turned into cost-sensitive data models

    Default or profit scoring credit systems? Evidence from European and US peer-to-peer lending markets

    Get PDF
    For the emerging peer-to-peer (P2P) lending markets to survive, they need to employ credit-risk management practices such that an investor base is profitable in the long run. Traditionally, credit-risk management relies on credit scoring that predicts loans’ probability of default. In this paper, we use a profit scoring approach that is based on modeling the annualized adjusted internal rate of returns of loans. To validate our profit scoring models with traditional credit scoring models, we use data from a European P2P lending market, Bondora, and also a random sample of loans from the Lending Club P2P lending market. We compare the out-of-sample accuracy and profitability of the credit and profit scoring models within several classes of statistical and machine learning models including the following: logistic and linear regression, lasso, ridge, elastic net, random forest, and neural networks. We found that our approach outperforms standard credit scoring models for Lending Club and Bondora loans. More specifically, as opposed to credit scoring models, returns across all loans are 24.0% (Bondora) and 15.5% (Lending Club) higher, whereas accuracy is 6.7% (Bondora) and 3.1% (Lending Club) higher for the proposed profit scoring models. Moreover, our results are not driven by manual selection as profit scoring models suggest investing in more loans. Finally, even if we consider data sampling bias, we found that the set of superior models consists almost exclusively of profit scoring models. Thus, our results contribute to the literature by suggesting a paradigm shift in modeling credit-risk in the P2P market to prefer profit as opposed to credit-risk scoring models

    Credit Risk Scoring: A Stacking Generalization Approach

    Get PDF
    Dissertation presented as the partial requirement for obtaining a Master's degree in Statistics and Information Management, specialization in Risk Analysis and ManagementCredit risk regulation has been receiving tremendous attention, as a result of the effects of the latest global financial crisis. According to the developments made in the Internal Rating Based approach, under the Basel guidelines, banks are allowed to use internal risk measures as key drivers to assess the possibility to grant a loan to an applicant. Credit scoring is a statistical approach used for evaluating potential loan applications in both financial and banking institutions. When applying for a loan, an applicant must fill out an application form detailing its characteristics (e.g., income, marital status, and loan purpose) that will serve as contributions to a credit scoring model which produces a score that is used to determine whether a loan should be granted or not. This enables faster and consistent credit approvals and the reduction of bad debt. Currently, many machine learning and statistical approaches such as logistic regression and tree-based algorithms have been used individually for credit scoring models. Newer contemporary machine learning techniques can outperform classic methods by simply combining models. This dissertation intends to be an empirical study on a publicly available bank loan dataset to study banking loan default, using ensemble-based techniques to increase model robustness and predictive power. The proposed ensemble method is based on stacking generalization an extension of various preceding studies that used different techniques to further enhance the model predictive capabilities. The results show that combining different models provides a great deal of flexibility to credit scoring models

    A Novel Hybrid Classification Model For the Loan Repayment Capability Prediction System

    Get PDF
    Classification is a powerful tool in Data mining to predict the loan repayment capability of a banking customer. This paper evaluates the performance of various classification algorithms and selects the most appropriate one for predicting the class label of the credit data set as good or bad. Feature selection is a data pre-processing technique refers to the process of identifying the most beneficial features for a given task, while avoiding the noisy, irrelevant and redundant features of the dataset. These irrelevant noisy features results in a poor accuracy for the selected classifier. In order to improve the accuracy of a classifier, the feature selection plays a vital role as a data preprocessing step. Feature selection technique reduces the dimensionality of the feature set of the dataset. This paper has two objectives. First objective is to find out the best classifier algorithm for the credit data set using two different tools such as weka and R. Here the experiment proved that Random Forest performs better for loan repayment credibility prediction system. The second objective is to evaluate the performance of various feature selection methods based on Random Forest classification method. Also a novel hybrid model is developed for the same
    corecore