177 research outputs found
An academic review: applications of data mining techniques in finance industry
With the development of Internet techniques, data volumes are doubling every two years, faster than predicted by Moore’s Law. Big Data Analytics becomes particularly important for enterprise business. Modern computational technologies will provide effective tools to help understand hugely accumulated data and leverage this information to get insights into the finance industry. In order to get actionable insights into the business, data has become most valuable asset of financial organisations, as there are no physical products in finance industry to manufacture. This is where data mining techniques come to their rescue by allowing access to the right information at the right time. These techniques are used by the finance industry in various areas such as fraud detection, intelligent forecasting, credit rating, loan management, customer profiling, money laundering, marketing and prediction of price movements to name a few. This work aims to survey the research on data mining techniques applied to the finance industry from 2010 to 2015.The review finds that Stock prediction and Credit rating have received most attention of researchers, compared to Loan prediction, Money Laundering and Time Series prediction. Due to the dynamics, uncertainty and variety of data, nonlinear mapping techniques have been deeply studied than linear techniques. Also it has been proved that hybrid methods are more accurate in prediction, closely followed by Neural Network technique. This survey could provide a clue of applications of data mining techniques for finance industry, and a summary of methodologies for researchers in this area. Especially, it could provide a good vision of Data Mining Techniques in computational finance for beginners who want to work in the field of computational finance
A cost-sensitive logistic regression credit scoring model based on multi-objective optimization approach
Credit scoring is an important process for peer-to-peer (P2P) lending companies as it determines whether loan applicants are likely to default. The aim of most credit scoring models is to minimize the classification error rate, which implies that all classification errors bear the same cost; however, in reality, there is a significant cost-sensitive problem in credit scoring methods. Therefore, in this paper, a new cost-sensitive logistic regression credit scoring model based on a multi-objective optimization approach is proposed that has two objectives in the cost-sensitive logistic regression process. The cost-sensitive logistic regression parameters are solved using a multiple objective particle swarm optimization (MOPSO) algorithm. In the empirical analysis, the proposed model was applied to the credit scoring of a Chinese famous P2P company, from which it was found that compared with other common credit scoring models, the proposed model was able to effectively reduce type II error rates and total classification error costs, and improve the AUC, the F1 values (reconciliation average of Recall and Precision), and the G-means. The proposed model was compared with other multi-objective optimization algorithms to further demonstrate that MOPSO is the best approach for cost-sensitive logistic regression credit scoring models.
First published online 27 November 201
Credit risk modeling: A comparative analysis of artificial and deep neural networks
Credit risk assessment plays a major role in the banks and financial institutions to prevent counterparty risk failure. One of the primary capabilities of a robust risk management system must be detecting the risks earlier, though many of the bank systems today lack this key capability which leads to further losses (MGI, 2017). In searching for an improved methodology to detect such credit risk and increasing the lacking capabilities earlier, a comparative analysis between Deep Neural Network (DNN) and machine learning techniques such as Support Vector Machines (SVM), K-Nearest Neighbours (KNN) and Artificial Neural Network (ANN) were conducted. The Deep Neural Network used in this study consists of six layers of neurons. Further, sampling techniques such as SMOTE, SVM-SMOTE, RUS, and All-KNN to make the imbalanced dataset a balanced one were also applied. Using supervised learning techniques, the proposed DNN model was able to achieve an accuracy of 82.18% with a ROC score of 0.706 using the RUS sampling technique. The All KNN sampling technique was capable of achieving the maximum true positives in two different models. Using the proposed approach, banks and credit check institutions can help prevent major losses occurring due to counterparty risk failure.credit riskdeep neural networkartificial neural networksupport vector machinessampling technique
Data mining in computational finance
Computational finance is a relatively new discipline whose birth can be traced back to early 1950s. Its major objective is to develop and study practical models focusing on techniques that apply directly to financial analyses. The large number of decisions and computationally intensive problems involved in this discipline make data mining and machine learning models an integral part to improve, automate, and expand the current processes.
One of the objectives of this research is to present a state-of-the-art of the data mining and machine learning techniques applied in the core areas of computational finance. Next, detailed analysis of public and private finance datasets is performed in an attempt to find interesting facts from data and draw conclusions regarding the usefulness of features within the datasets.
Credit risk evaluation is one of the crucial modern concerns in this field. Credit scoring is essentially a classification problem where models are built using the information about past applicants to categorise new applicants as ‘creditworthy’ or ‘non-creditworthy’. We appraise the performance of a few classical machine learning algorithms for the problem of credit scoring.
Typically, credit scoring databases are large and characterised by redundant and irrelevant features, making the classification task more computationally-demanding. Feature selection is the process of selecting an optimal subset of relevant features. We propose an improved information-gain directed wrapper feature selection method using genetic algorithms and successfully evaluate its effectiveness against baseline and generic wrapper methods using three benchmark datasets.
One of the tasks of financial analysts is to estimate a company’s worth. In the last piece of work, this study predicts the growth rate for earnings of companies using three machine learning techniques. We employed the technique of lagged features, which allowed varying amounts of recent history to be brought into the prediction task, and transformed the time series forecasting problem into a supervised learning problem. This work was applied on a private time series dataset
Credit risk modeling: A comparative analysis of artificial and deep neural networks
Credit risk assessment plays a major role in the banks and financial institutions to prevent counterparty risk failure. One of the primary capabilities of a robust risk management system must be detecting the risks earlier, though many of the bank systems today lack this key capability which leads to further losses (MGI, 2017). In searching for an improved methodology to detect such credit risk and increasing the lacking capabilities earlier, a comparative analysis between Deep Neural Network (DNN) and machine learning techniques such as Support Vector Machines (SVM), K-Nearest Neighbours (KNN) and Artificial Neural Network (ANN) were conducted. The Deep Neural Network used in this study consists of six layers of neurons. Further, sampling techniques such as SMOTE, SVM-SMOTE, RUS, and All-KNN to make the imbalanced dataset a balanced one were also applied. Using supervised learning techniques, the proposed DNN model was able to achieve an accuracy of 82.18% with a ROC score of 0.706 using the RUS sampling technique. The All KNN sampling technique was capable of achieving the maximum true positives in two different models. Using the proposed approach, banks and credit check institutions can help prevent major losses occurring due to counterparty risk failure.credit riskdeep neural networkartificial neural networksupport vector machinessampling technique
Information gain directed genetic algorithm wrapper feature selection for credit rating
Financial credit scoring is one of the most crucial processes in the finance industry sector to be able to assess the credit-worthiness of individuals and enterprises. Various statistics-based machine learning techniques have been employed for this task. “Curse of Dimensionality” is still a significant challenge in machine learning techniques. Some research has been carried out on Feature Selection (FS) using genetic algorithm as wrapper to improve the performance of credit scoring models. However, the challenge lies in finding an overall best method in credit scoring problems and improving the time-consuming process of feature selection. In this study, the credit scoring problem is investigated through feature selection to improve classification performance. This work proposes a novel approach to feature selection in credit scoring applications, called as Information Gain Directed Feature Selection algorithm (IGDFS), which performs the ranking of features based on information gain, propagates the top m features through the GA wrapper (GAW) algorithm using three classical machine learning algorithms of KNN, Naïve Bayes and Support Vector Machine (SVM) for credit scoring. The first stage of information gain guided feature selection can help reduce the computing complexity of GA wrapper, and the information gain of features selected with the IGDFS can indicate their importance to decision making
Forecasting Government Bond Spreads with Heuristic Models:Evidence from the Eurozone Periphery
This study investigates the predictability of European long-term government bond spreads through the application of heuristic and metaheuristic support vector regression (SVR) hybrid structures. Genetic, krill herd and sine–cosine algorithms are applied to the parameterization process of the SVR and locally weighted SVR (LSVR) methods. The inputs of the SVR models are selected from a large pool of linear and non-linear individual predictors. The statistical performance of the main models is evaluated against a random walk, an Autoregressive Moving Average, the best individual prediction model and the traditional SVR and LSVR structures. All models are applied to forecast daily and weekly government bond spreads of Greece, Ireland, Italy, Portugal and Spain over the sample period 2000–2017. The results show that the sine–cosine LSVR is outperforming its counterparts in terms of statistical accuracy, while metaheuristic approaches seem to benefit the parameterization process more than the heuristic ones
Data Mining
Data mining is a branch of computer science that is used to automatically extract meaningful, useful knowledge and previously unknown, hidden, interesting patterns from a large amount of data to support the decision-making process. This book presents recent theoretical and practical advances in the field of data mining. It discusses a number of data mining methods, including classification, clustering, and association rule mining. This book brings together many different successful data mining studies in various areas such as health, banking, education, software engineering, animal science, and the environment
Using neural networks and support vector machines for default prediction in South Africa
A thesis submitted to the Faculty of Computer Science and Applied Mathematics,
University of Witwatersrand,
in fulfillment of the requirements for the
Master of Science (MSc)
Johannesburg
Feb 2017This is a thesis on credit risk and in particular bankruptcy prediction. It investigates
the application of machine learning techniques such as support vector machines and
neural networks for this purpose. This is not a thesis on support vector machines
and neural networks, it simply looks at using these functions as tools to preform the
analysis.
Neural networks are a type of machine learning algorithm. They are nonlinear mod-
els inspired from biological network of neurons found in the human central nervous
system. They involve a cascade of simple nonlinear computations that when aggre-
gated can implement robust and complex nonlinear functions. Neural networks can
approximate most nonlinear functions, making them a quite powerful class of models.
Support vector machines (SVM) are the most recent development from the machine
learning community. In machine learning, support vector machines (SVMs) are su-
pervised learning algorithms that analyze data and recognize patterns, used for clas-
si cation and regression analysis. SVM takes a set of input data and predicts, for
each given input, which of two possible classes comprises the input, making the SVM
a non-probabilistic binary linear classi er. A support vector machine constructs a
hyperplane or set of hyperplanes in a high or in nite dimensional space, which can
be used for classi cation into the two di erent data classes.
Traditional bankruptcy prediction medelling has been criticised as it makes certain
underlying assumptions on the underlying data. For instance, a frequent requirement
for multivarate analysis is a joint normal distribution and independence of variables.
Support vector machines (and neural networks) are a useful tool for default analysis
because they make far fewer assumptions on the underlying data.
In this framework support vector machines are used as a classi er to discriminate
defaulting and non defaulting companies in a South African context. The input data
required is a set of nancial ratios constructed from the company's historic nancial
statements. The data is then Divided into the two groups: a company that has
defaulted and a company that is healthy (non default). The nal data sample used
for this thesis consists of 23 nancial ratios from 67 companies listed on the jse.
Furthermore for each company the company's probability of default is predicted.
The results are benchmarked against more classical methods that are commonly used
for bankruptcy prediction such as linear discriminate analysis and logistic regression.
Then the results of the support vector machines, neural networks, linear discriminate
analysis and logistic regression are assessed via their receiver operator curves and
pro tability ratios to gure out which model is more successful at predicting default.MT 201
Desenvolvimento de frameworks para a modelagem do risco de crédito por meio de algoritmos de classificação
Granting credit is a vital activity in the financial industry. For the success of financial institutions, as well as the equilibrium of the credit system as a whole, it is important that credit risk management systems efficiently evaluate the probability of default of potential debtors based on their historical data. Classification algorithms are an interesting approach to this problem in the form of Credit Scoring models. Since the emergence of quantitative analytical methods with this purpose, statistical models persist as the most commonly chosen method, given their easier implementation and inherent interpretability. However, advances in Machine Learning have developed new and more complex algorithms capable of handling a bigger amount of data, often with an increase in predictive power. These new approaches, although not always readily transferable to practical applications in the financial industry, present an opportunity for the development of credit risk modeling and have piqued the interest of researchers in the field. Nonetheless, researchers seem to focus on model performance, not appropriately setting up guidelines to optimize the modeling process or considering the present regulation for model implementation. Thereby, this dissertation establishes frameworks for consumer credit risk modeling based on classification algorithms while guided by a systematic literature review on the topic. The proposed frameworks incorporate ML techniques, data preprocessing and balancing, feature selection (FS), and hyperparameter optimization (HPO). In addition to the bibliographic research, which introduces us to the main classification algorithms and appropriate modeling steps, the development of the frameworks is also based on experiments with hundreds of models for credit risk classification, using Logistic Regression (LR), Decision Trees (DT), Support Vector Machines (SVM), Random Forest (RF), as well as boosting and stacking ensembles, to efficiently guide the construction of robust and parsimonious models for credit risk analysis in consumer lending.Agência 1A concessão de crédito é uma atividade vital da indústria financeira. Para o funcionamento e sucesso das instituições financeiras, assim como a manutenção do equilíbrio do sistema creditício, a modelagem de risco de crédito tem o papel de avaliar a probabilidade de inadimplência de potenciais devedores com base em dados históricos. Algoritmos de classificação apresentam uma abordagem interessante para esta finalidade na elaboração de modelos para Credit Scoring. Desde o surgimento das metodologias analíticas e quantitativas para esta modelagem, persistem na indústria modelos estatísticos, dotados de maior interpretabilidade e fácil implementação. Contudo, com o desenvolvimento na área de Machine Learning (ML), surgiram novos algoritmos capazes de trabalhar com um maior volume de dados e com melhor performance preditiva. Estes algoritmos, apesar de nem sempre prontamente transferíveis da academia para a indústria, apresentam uma oportunidade para o desenvolvimento da modelagem do risco de crédito, tendo consequentemente despertado um interesse de pesquisadores na área. A literatura, por sua vez, se enfoca na performance dos modelos, dificilmente estabelecendo diretrizes para a otimização do processo de modelagem ou se atentando às regulamentações vigentes para a sua aplicação prática na indústria financeira. Desta forma, esta dissertação, embasada por uma revisão sistemática de literatura, propõe frameworks para a modelagem do risco de crédito incorporando o uso de técnicas de ML, pré-processamento e balanceamento de dados, feature selection (FS) e otimização de hiper-parâmetros (OHP). Além da pesquisa bibliográfica, que possibilita uma familiarização com os principais algoritmos de classificação e as etapas de modelagem apropriadas, o desenvolvimento dos frameworks também é fundamentado pela elaboraçao de centenas de modelos para classificação do risco de crédito, partindo dos algoritmos de Regressão Logística (Logistic Regression - LR), Árvores de Decisão (Decision Trees - DT), Support Vector Machines (SVM), Random Forest (RF), assim como ensembles de boosting e stacking, para direcionar de maneira eficiente a construção de modelos robustos e parcimoniosos para a análise do risco na concessão de crédito ao consumidor
- …