17 research outputs found

    Three-stage ensemble model : reinforce predictive capacity without compromising interpretability

    Get PDF
    Thesis proposal presented as partial requirement for obtaining the Master’s degree in Statistics and Information Management, with specialization in Risk Analysis and ManagementOver the last decade, several banks have developed models to quantify credit risk. In addition to the monitoring of the credit portfolio, these models also help deciding the acceptance of new contracts, assess customers profitability and define pricing strategy. The objective of this paper is to improve the approach in credit risk modeling, namely in scoring models to predict default events. To this end, we propose the development of a three-stage ensemble model that combines the results interpretability of the Scorecard with the predictive power of machine learning algorithms. The results show that ROC index improves 0.5%-0.7% and Accuracy 0%-1% considering the Scorecard as baseline

    Insolvency Forecasting through Trend Analysis with Full Ignorance of Probabilities

    Get PDF
    The complex views of insolvency proceedings are unique, poorly known, interdisciplinary and multidimensional, even though there is a broad spectrum of different BM (Bankruptcy Models). Therefore, it is often prohibitively difficult to make forecasts using numerical quantifiers and traditional statistical methods. The least information-intensive trend values are used: positive, increasing, zero, constant, negative, decreasing. The solution of a trend model is a set of scenarios where X is the set of variables quantified by the trends. All possible transitions among the scenarios are generated. An oriented transitional graph has a set of scenarios as nodes and the transitions as arcs. An oriented path describes any possible future and past time behaviour of the bankruptcy system under study. The graph represents the complete list of forecasts based on trends. An eight-dimensional model serves as a case study. On the transitional graph of the case study model, decision tree heuristics are used for calculating the probabilities of the terminal scenarios and possible payoffs

    Avaliação de falências de empresas por meio de florestas causais

    Get PDF
    This study sought to analyze the variables that can influence company bankruptcy. For several years, the main studies on bankruptcy reported on the conventional methodologies with the aim of predicting it. In their analyses, the use of accounting variables was massively predominant. However, when applying them, the accounting variables were considered as homogenous; that is, for the traditional models, it was assumed that in all companies the behavior of the indicators was similar, and the heterogeneity among them was ignored. The relevance of the financial crisis that occurred at the end of 2007 is also observed; it caused a major global financial collapse, which had different effects on a wide variety of sectors and companies. Within this context, research that aims to identify problems such as the heterogeneity among companies and analyze the diversities among them are gaining relevance, given that the sector-related characteristics of capital structure and size, among others, vary depending on the company. Based on this, new approaches applied to bankruptcy prediction modeling should consider the heterogeneity among companies, aiming to improve the models used even more. A causal tree and forest were used together with quarterly accounting and sector-related data on 1,247 companies, 66 of which were bankrupt, 44 going bankrupt after 2008 and 22 before. The results showed that there is unobserved heterogeneity when the company bankruptcy processes are analyzed, raising questions about the traditional models such as discriminant analysis and logit, among others. Consequently, with the large volume in terms of dimensions, it was observed that there may be a functional form capable of explaining company bankruptcy, but this is not linear. It is also highlighted that there are sectors that are more prone to financial crises, aggravating the bankruptcy process.Esta pesquisa buscou analisar as variáveis que podem influenciar a falência das empresas. Durante vários anos, as principais pesquisas sobre falência reportaram as metodologias convencionais visando à sua predição. Em suas análises, a utilização de variáveis contábeis predominou maciçamente. Porém, ao aplicá-las, as variáveis contábeis eram consideradas homogêneas, ou seja, para os modelos tradicionais, presumia-se que em todas as empresas o comportamento dos indicadores era similar, ignorando a heterogeneidade entre elas. Observa-se, ainda, a relevância da crise financeira ocorrida no final de 2007, causando grande colapso financeiro mundial, tendo efeitos diferentes nos mais diversos setores e empresas. Nesse cenário, pesquisas que visam identificar problemas como a heterogeneidade entre as empresas e analisar as diversidades entre elas ganham relevância, haja vista que as características setoriais de estrutura de capital, porte, dentre outras, variam de acordo com as empresas. A partir disso, novas abordagens aplicadas à modelagem de previsão de falência devem considerar a heterogeneidade entre as empresas, buscando aprimorar ainda mais as modelagens utilizadas. Foram utilizadas a árvore e a floresta causais com dados contábeis trimestrais e setoriais de 1.247 empresas, sendo 66 falidas, das quais 44 depois de 2008 e 22 antes. Os resultados mostraram que existe heterogeneidade não observada quando se analisam os processos de falência das empresas, colocando em cheque os modelos tradicionais como, por exemplo, análise discriminante e logit, dentre outros. Por conseguinte, com o elevado volume em dimensões, observou-se que pode haver uma forma funcional capaz de explicar a falência das empresas, porém essa não é linear. Destaca-se, ainda, que existem setores mais propensos a crises financeiras, agravando o processo de falência.Esta pesquisa buscou analisar as variáveis que podem influenciar a falência das empresas. Durante vários anos, as principais pesquisas sobre falência reportaram as metodologias convencionais visando à sua predição. Em suas análises, a utilização de variáveis contábeis predominou maciçamente. Porém, ao aplicá-las, as variáveis contábeis eram consideradas homogêneas, ou seja, para os modelos tradicionais, presumia-se que em todas as empresas o comportamento dos indicadores era similar, ignorando a heterogeneidade entre elas. Observa-se, ainda, a relevância da crise financeira ocorrida no final de 2007, causando grande colapso financeiro mundial, tendo efeitos diferentes nos mais diversos setores e empresas. Nesse cenário, pesquisas que visam identificar problemas como a heterogeneidade entre as empresas e analisar as diversidades entre elas ganham relevância, haja vista que as características setoriais de estrutura de capital, porte, dentre outras, variam de acordo com as empresas. A partir disso, novas abordagens aplicadas à modelagem de previsão de falência devem considerar a heterogeneidade entre as empresas, buscando aprimorar ainda mais as modelagens utilizadas. Foram utilizadas a árvore e a floresta causais com dados contábeis trimestrais e setoriais de 1.247 empresas, sendo 66 falidas, das quais 44 depois de 2008 e 22 antes. Os resultados mostraram que existe heterogeneidade não observada quando se analisam os processos de falência das empresas, colocando em cheque os modelos tradicionais como, por exemplo, análise discriminante e logit, dentre outros. Por conseguinte, com o elevado volume em dimensões, observou-se que pode haver uma forma funcional capaz de explicar a falência das empresas, porém essa não é linear. Destaca-se, ainda, que existem setores mais propensos a crises financeiras, agravando o processo de falência

    Enhancing Confusion Entropy (CEN) for Binary and Multiclass Classification

    Get PDF
    Different performance measures are used to assess the behaviour, and to carry out the comparison, of classifiers in Machine Learning. Many measures have been defined on the literature, and among them, a measure inspired by Shannon's entropy named the Confusion Entropy (CEN). In this work we introduce a new measure, MCEN, by modifying CEN to avoid its unwanted behaviour in the binary case, that disables it as a suitable performance measure in classification. We compare MCEN with CEN and other performance measures, presenting analytical results in some particularly interesting cases, as well as some heuristic computational experimentation.This work was supported by Ministerio de Economía y Competitividad, Gobierno de España, MTM2015 67802-P to R.D. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript

    Decision making based on partially known decision trees

    Get PDF
    Existuje široké spektrum různých algoritmu pro predikci insolvence. Komplexní pojetí insolvenčního řízení z pohledu obou zúčastněných stran (dlužník versus věřitel) a z pohledu makroekonomie, které zachycuje tato disertační práce je však nové. Často je velmi obtížné vytvářet prognózy pomocí numerických kvantifikátorů a tradičních statistických metod. Důvod je nedostatek vstupních dat. V práci se tedy používají nástroje trendové analýzy založené na nejméně informačně intensivních kvantifikátorech, tj. trendech, rostoucí, konstantní, klesající. Řešení trendového modelu je množina scénářů, kde je množina proměnných kvantifikovaných pomocí těchto trendů. Všechny možné přechody mezi scénáři jsou generovány a vyneseny do přechodových grafů. Orientovaný přechodový graf má jako uzly množinu scénářů a jako větve přechody mezi scénáři. Daná cesta skrz přechodový graf popisuje jakékoliv možné budoucí a minulé chování zkoumaného insolvenčního systému. Přechodový graf představuje kompletní seznam prognóz založených na trendech. V práci jsou taktéž uvedeny a použity heuristiky pro determinování výplatních hodnot z insolvenčního řízení aplikovatelné s nástroji rozhodovacích stromu a vygenerovaných přechodových grafů z trendových analýz. Devíti dimenzionální model slouží jako případová studie. V modelech se používají proměnné vágního charakteru, které mohou mít majoritní vliv na celý proces insolvence, např. Úroveň chamtivosti a vliv politického situace.There is a wide range of different algorithms for insolvency prediction. The complex concept of insolvency proceedings from the point of view of both parties (debtor versus creditor) and from the point of view of the macroeconomics in this dissertation is new. It is often very difficult to generate forecasts using numerical quantifiers and traditional statistical methods. The reason is the lack of input data. Therefore, the work uses trend analysis tools based on the least information intensive quantifiers, ie trends, increasing, constant, and decreasing. A trend model solution is a set of scenarios where a set of variables is quantified by these trends. All possible transitions between the scenarios are generated and plotted in transition graphs. The oriented transition graph has as a node a set of scenarios, and as a branch the transitions between the scenarios. The given path through the transition graph describes any possible future and past behavior of the insolvency system being investigated. The Transition graph is a complete list of trend-based forecasts. The heuristics for determination of the payoff values from the insolvency proceedings applicable to the decision tree tools and the generated transition graphs from trend analyzes are also presented and used in the thesis. A nine-dimensional model serves as a case study. Vague variables are used in models that may have a major impact on the entire insolvency process, eg greed level and political situation.

    Forecasting Financial Distress With Machine Learning – A Review

    Get PDF
    Purpose – Evaluate the various academic researches with multiple views on credit risk and artificial intelligence (AI) and their evolution.Theoretical framework – The study is divided as follows: Section 1 introduces the article. Section 2 deals with credit risk and its relationship with computational models and techniques. Section 3 presents the methodology. Section 4 addresses a discussion of the results and challenges on the topic. Finally, section 5 presents the conclusions.Design/methodology/approach – A systematic review of the literature was carried out without defining the time period and using the Web of Science and Scopus database.Findings – The application of computational technology in the scope of credit risk analysis has drawn attention in a unique way. It was found that the demand for identification and introduction of new variables, classifiers and more assertive methods is constant. The effort to improve the interpretation of data and models is intense.Research, Practical & Social implications – It contributes to the verification of the theory, providing information in relation to the most used methods and techniques, it brings a wide analysis to deepen the knowledge of the factors and variables on the theme. It categorizes the lines of research and provides a summary of the literature, which serves as a reference, in addition to suggesting future research.Originality/value – Research in the area of Artificial Intelligence and Machine Learning is recent and requires attention and investigation, thus, this study contributes to the opening of new views in order to deepen the work on this topic

    Un análisis bibliométrico de la predicción de quiebra empresarial con Machine Learning

    Get PDF
    The aim of this article is to present a bibliometric analysis on the use that Machine Learning (ML) techniques have had in the process of predicting business bankruptcy through the review of the Web of Science database. This exercise provides information on the initiation and adaptation process of such techniques. For this, the different ml techniques applied in the bankruptcy prediction model are identified. As a result, 327 documents are obtained, of which they are clas­sified by performance evaluation measure, the area under the curve (AUC) and precision (ACC), these being the most used in the classification process. In ad­dition, the relationship between researchers, institutions and countries with the largest number of applications of this type is identified. The results show how the XGBoost, SVM, Smote, RF and D algorithms present a much greater predictive capacity than traditional methodologies; focused on a time horizon before the event given its greater precision. Similarly, financial and non-financial variables contribute favorably to said estimate.El objetivo de este artículo es presentar un análisis bibliométrico sobre el uso que han tenido las técnicas de Machine Learning (ML) en el proceso de predic­ción de quiebra empresarial a través de la revisión de la base de datos Web of Science. Este ejercicio brinda información sobre el inicio y el proceso de adap­tación de dichas técnicas. Para ello, se identifican las diferentes técnicas de ml aplicadas en modelo de predicción de quiebras. Se obtiene como resultado 327 documentos, los cuales se clasifican por medida de evaluación del desempe­ño, área bajo la curva (AUC) y precisión (ACC), por ser las más utilizadas en el proceso de clasificación. Además, se identifica la relación entre investigadores, instituciones y países con mayor número de aplicaciones de este tipo. Los re­sultados evidencian que los algoritmos XGBoost, SVM, Smote, RFY DT presentan una capacidad predictiva mucho mayor que las metodologías tradicionales, en­focados en un horizonte de tiempo antes del suceso dada su mayor precisión. Así mismo, las variables financieras y no financieras contribuyen de manera favorable a dicha estimación
    corecore