168 research outputs found

    The efficiency of bankruptcy predictive models - genetic algorithms approach

    Get PDF
    Dissertation presented as the partial requirement for obtaining a Master's degree in Information Management, specialization in Knowledge Management and Business IntelligenceThe present dissertation evaluates the contribution of genetic algorithms to improve the performance of bankruptcy prediction models. The state-of-the-art points to a better performance of MDA (Multiple Discriminant Analysis)-based models, which, since 1968, are the most applied in the field of bankruptcy prediction. These models usually recur to ratios commonly used in financial analysis. From the comparative study of (1) logistic regression-based models with the forward stepwise method for feature selection, (2) Altman's Z-Score model (Edward I. Altman, 1983) based on MDA and (3) logistic regression with the contribution of genetic algorithms for variable selection, a clear predominance of the efficiency revealed by the former models can be observed. These new models were developed using 1887 ratios generated a posteriori from 66 known variables, derived from the accounting, financial, operating, and macroeconomic analysis of firms. New models are thus presented, which are very promising for predicting bankruptcy in the medium to long term, in the context of increasing instability surrounding firms for different countries and sectors.A dissertação realizada avalia a contribuição dos algoritmos genéticos para melhorar a performance dos modelos de previsão de falência. O estado da arte aponta para uma melhor performance dos modelos baseados em MDA (Análise descriminante multivariada) que por isso, desde de 1968, são os mais aplicados no âmbito da previsão de falência. Estes modelos recorrem habitualmente a rácios comumente utlizados em análise financeira. A partir do estudo comparado de modelos baseados em (1) regressão logística com o método forward stepwise para escolha variáveis, (2) o modelo Z-Score de Edward Altman (1983) baseado em MDA e (3) regressão logística com o contributo de algoritmos genéticos para escolha variáveis, observa-se um claro predomínio da eficácia revelada por estes últimos. Estes novos modelos, agora propostos, foram desenvolvidos com recurso a 1887 rácios gerados a posteriori a partir de 66 variáveis conhecidas, oriundas da análise contabilística, financeira, de funcionamento e de enquadramento macroeconómico das empresas. São assim apresentados novos modelos, muito promissores, para a previsão de falência a médio longo prazo em contexto de crescente instabilidade na envolvente das empresas, para diferentes países e sectores

    Feature selection for bankruptcy prediction: a multi-objective optimization approach

    Get PDF
    In this work a Multi-Objective Evolutionary Algorithm (MOEA) was applied for feature selection in the problem of bankruptcy prediction. The aim is to maximize the accuracy of the classifier while keeping the number of features low. A two-objective problem - minimization of the number of features and accuracy maximization – was fully analyzed using two classifiers, Logistic Regression (LR) and Support Vector Machines (SVM). Simultaneously, the parameters required by both classifiers were also optimized. The validity of the methodology proposed was tested using a database containing financial statements of 1200 medium sized private French companies. Based on extensive tests it is shown that MOEA is an efficient feature selection approach. Best results were obtained when both the accuracy and the classifiers parameters are optimized. The method proposed can provide useful information for the decision maker in characterizing the financial health of a company

    Company bankruptcy prediction framework based on the most influential features using XGBoost and stacking ensemble learning

    Get PDF
    Company bankruptcy is often a very big problem for companies. The impact of bankruptcy can cause losses to elements of the company such as owners, investors, employees, and consumers. One way to prevent bankruptcy is to predict the possibility of bankruptcy based on the company's financial data. Therefore, this study aims to find the best predictive model or method to predict company bankruptcy using the dataset from Polish companies bankruptcy. The prediction analysis process uses the best feature selection and ensemble learning. The best feature selection is selected using feature importance to XGBoost with a weight value filter of 10. The ensemble learning method used is stacking. Stacking is composed of the base model and meta learner. The base model consists of K-nearest neighbor, decision tree, SVM, and random forest, while the meta learner used is LightGBM. The stacking model accuracy results can outperform the base model accuracy with an accuracy rate of 97%

    Optimizing Ensemble Weights for Machine Learning Models: A Case Study for Housing Price Prediction

    Get PDF
    Designing ensemble learners has been recognized as one of the significant trends in the field of data knowledge especially in data science competitions. Building models that are able to outperform all individual models in terms of bias, which is the error due to the difference in the average model predictions and actual values, and variance, which is the variability of model predictions, has been the main goal of the studies in this area. An optimization model has been proposed in this paper to design ensembles that try to minimize bias and variance of predictions. Focusing on service sciences, two well-known housing datasets have been selected as case studies: Boston housing and Ames housing. The results demonstrate that our designed ensembles can be very competitive in predicting the house prices in both Boston and Ames datasets

    Towards ML-based Platforms in Finance Industry – An ML Approach to Generate Corporate Bankruptcy Probabilities based on Annual Financial Statements

    Get PDF
    The increasing interest in Machine Learning (ML) based services and the need for more intelligent and automated processes in the finance industry brings new challenges and requires practitioners and academics to design, develop, and maintain new ML approaches for financial services companies. The main objective of this paper is to provide a standardized procedure to deal with cases that suffer from imbalanced datasets. For this, we propose design recommendations on, how to test and combine multiple oversampling techniques such as SMOTE, SMOTE-ENN and SMOTE-Tomek on such datasets with multiple ML models and attribute-based structure to reach higher accuracies. Moreover, this paper considers to find an appropriate structure while maintaining such systems that work with periodically changing datasets, so that the incoming datasets can be analyzed regularly via this procedure

    Technical and Fundamental Features Analysis for Stock Market Prediction with Data Mining Methods

    Get PDF
    Predicting stock prices is an essential objective in the financial world. Forecasting stock returns and their risk represents one of the most critical concerns of market decision makers. This thesis investigates the stock price forecasting with three approaches from the data mining concept and shows how different elements in the stock price can help to enhance the accuracy of our prediction. For this reason, the first and second approaches capture many fundamental indicators from the stocks and implement them as explanatory variables to do stock price classification and forecasting. In the third approach, technical features from the candlestick representation of the share prices are extracted and used to enhance the accuracy of the forecasting. In each approach, different tools and techniques from data mining and machine learning are employed to justify why the forecasting is working. Furthermore, since the idea is to evaluate the potential of features in the stock trend forecasting, therefore we diversify our experiments using both technical and fundamental features. Therefore, in the first approach, a three-stage methodology is developed while in the first step, a comprehensive investigation of all possible features which can be effective on stocks risk and return are identified. Then, in the next stage, risk and return are predicted by applying data mining techniques for the given features. Finally, we develop a hybrid algorithm, based on some filters and function-based clustering; and re-predicted the risk and return of stocks. In the second approach, instead of using single classifiers, a fusion model is proposed based on the use of multiple diverse base classifiers that operate on a common input and a meta-classifier that learns from base classifiers’ outputs to obtain a more precise stock return and risk predictions. A set of diversity methods, including Bagging, Boosting, and AdaBoost, is applied to create diversity in classifier combinations. Moreover, the number and procedure for selecting base classifiers for fusion schemes are determined using a methodology based on dataset clustering and candidate classifiers’ accuracy. Finally, in the third approach, a novel forecasting model for stock markets based on the wrapper ANFIS (Adaptive Neural Fuzzy Inference System) – ICA (Imperialist Competitive Algorithm) and technical analysis of Japanese Candlestick is presented. Two approaches of Raw-based and Signal-based are devised to extract the model’s input variables and buy and sell signals are considered as output variables. To illustrate the methodologies, for the first and second approaches, Tehran Stock Exchange (TSE) data for the period from 2002 to 2012 are applied, while for the third approach, we used General Motors and Dow Jones indexes.Predicting stock prices is an essential objective in the financial world. Forecasting stock returns and their risk represents one of the most critical concerns of market decision makers. This thesis investigates the stock price forecasting with three approaches from the data mining concept and shows how different elements in the stock price can help to enhance the accuracy of our prediction. For this reason, the first and second approaches capture many fundamental indicators from the stocks and implement them as explanatory variables to do stock price classification and forecasting. In the third approach, technical features from the candlestick representation of the share prices are extracted and used to enhance the accuracy of the forecasting. In each approach, different tools and techniques from data mining and machine learning are employed to justify why the forecasting is working. Furthermore, since the idea is to evaluate the potential of features in the stock trend forecasting, therefore we diversify our experiments using both technical and fundamental features. Therefore, in the first approach, a three-stage methodology is developed while in the first step, a comprehensive investigation of all possible features which can be effective on stocks risk and return are identified. Then, in the next stage, risk and return are predicted by applying data mining techniques for the given features. Finally, we develop a hybrid algorithm, based on some filters and function-based clustering; and re-predicted the risk and return of stocks. In the second approach, instead of using single classifiers, a fusion model is proposed based on the use of multiple diverse base classifiers that operate on a common input and a meta-classifier that learns from base classifiers’ outputs to obtain a more precise stock return and risk predictions. A set of diversity methods, including Bagging, Boosting, and AdaBoost, is applied to create diversity in classifier combinations. Moreover, the number and procedure for selecting base classifiers for fusion schemes are determined using a methodology based on dataset clustering and candidate classifiers’ accuracy. Finally, in the third approach, a novel forecasting model for stock markets based on the wrapper ANFIS (Adaptive Neural Fuzzy Inference System) – ICA (Imperialist Competitive Algorithm) and technical analysis of Japanese Candlestick is presented. Two approaches of Raw-based and Signal-based are devised to extract the model’s input variables and buy and sell signals are considered as output variables. To illustrate the methodologies, for the first and second approaches, Tehran Stock Exchange (TSE) data for the period from 2002 to 2012 are applied, while for the third approach, we used General Motors and Dow Jones indexes.154 - Katedra financívyhově

    A Prediction Modeling Framework For Noisy Welding Quality Data

    Get PDF
    Numerous and various research projects have been conducted to utilize historical manufacturing process data in product design. These manufacturing process data often contain data inconsistencies, and it causes challenges in extracting useful information from the data. In resistance spot welding (RSW), data inconsistency is a well-known issue. In general, such inconsistent data are treated as noise data and removed from the original dataset before conducting analyses or constructing prediction models. This may not be desirable for every design and manufacturing applications since every data can contain important information to further explain the process. In this research, we propose a prediction modeling framework, which employs bootstrap aggregating (bagging) with support vector regression (SVR) as the base learning algorithm to improve the prediction accuracy on such noisy data. Optimal hyper-parameters for SVR are selected by particle swarm optimization (PSO) with meta-modeling. Constructing bagging models require 114 more computational costs than a single model. Also, evolutionary computation algorithms, such as PSO, generally require a large number of candidate solution evaluations to achieve quality solutions. These two requirements greatly increase the overall computational cost in constructing effective bagging SVR models. Meta-modeling can be employed to reduce the computational cost when the fitness or constraints functions are associated with computationally expensive tasks or analyses. In our case, the objective function is associated with constructing bagging SVR models with candidate sets of hyper-parameters. Therefore, in regards to PSO, a large number of bagging SVR models have to be constructed and evaluated, which is computationally expensive. The meta-modeling approach, called MUGPSO, developed in this research assists PSO in evaluating these candidate solutions (i.e., sets of hyper-parameters). MUGPSO approximates the fitness function of candidate solutions. Through this method, the numbers of real fitness function evaluations (i.e., constructing bagging SVR models) are reduced, which also reduces the overall computational costs. Using the Meta2 framework, one can expect an improvement in the prediction accuracy with reduced computational time. Experiments are conducted on three artificially generated noisy datasets and a real RSW quality dataset. The results indicate that Meta2 is capable of providing promising solutions with noticeably reduced computational costs
    corecore