619 research outputs found

    Predicting credit rating change using machine learning and natural language processing

    Get PDF
    Corporate credit ratings provide standardized third-party information for market participants. They offer many benefits for issuers, intermediaries and investors and generally increase trust and efficiency in the market. Credit ratings are provided by credit rating agencies. In addition to quantitative information of companies (e.g. financial statements), the qualitative information in company-related textual documents is known to be a determinant in the credit rating process. However, the way in which the credit rating agencies interpret this data is not public information. The purpose of this thesis is to develop a supervised machine learning model that predicts credit rating changes as a binary classification problem, based on form 10-k annual reports of public U.S. companies. Before using in the classification task, the form 10-k reports are pre-processed using natural language processing methods. More generally, this thesis aims to answer, to what extent a change in a company’s credit rating can be predicted based on the form 10-k reports, and whether the use of topic modeling can improve the results. A total of five different machine learning algorithms are used for the binary classification of this thesis and their performances are compared. These algorithms are support vector machine, logistic regression, decision tree, random forest and naïve Bayes classifier. Topic modeling is implemented using latent semantic analysis. The studies of Hajek et al. (2016) and Chen et al. (2017) are the main sources of inspiration for this thesis. The methods used in this thesis are for the most part similar as in these studies. This thesis adds value to the findings of these studies by finding out how credit rating prediction methods in Hajek et al. (2016), binary classification methods in Chen et al. (2017) and utilization of form 10-k annual reports (used in both Hajek et al. (2016) and Chen et al. (2017) can be combined as a binary credit rating classifier. The results of the study show that credit rating change can be predicted using 10-k data, but the predictions are not very accurate. The best classification results were obtained using a support vector machine, with an accuracy of 69.4% and an AUC of 0.6744. No significant improvement on classification performance was obtained using topic modeling.Yritysten luottoluokitukset antavat standardoitua kolmannen osapuolen tietoa markkinaosapuolille. Ne tarjoavat monia etuja liikkeellelaskijoille, välittäjille ja sijoittajille ja lisäävät yleistä luottamusta ja tehokkuutta markkinoilla. Luottoluokituksia myöntävät luottoluokituslaitokset. Kvantitatiivisten yritystä koskevien tietojen (esim. Tilinpäätöstietojen) lisäksi yrityksen julkaiseman tekstimuotoisen datan sisältävien laadullisten tietojen tiedetään vaikuttavan luottoluokitusprosessiin. Tapa, jolla luottoluokituslaitokset tulkitsevat tätä tietoa, ei kuitenkaan ole julkisesti tiedossa. Tämän tutkielman tarkoituksena on kehittää ohjattu koneoppimismalli, joka ennustaa luottoluokitusmuutoksia binäärisenä luokitteluongelmana Yhdysvalloissa toimivien pörssiyhtiöiden 10-k -muotoisten vuosikertomuksien perusteella. 10-k vuosikertomukset esikäsitellään luonnollisen kielen käsittelyn menetelmillä, ennen kuin niitä käytetään luokittelutehtävässä. Yleisemmin tämän tutkielman tavoitteena on selvittää, missä määrin yrityksen luottoluokituksen muutosta voidaan ennustaa 10-k vuosikertomuksen perusteella ja voidaanko aihemallinnuksen avulla parantaa tuloksia. Tutkielmassa käytetään binääriseen luokitteluun yhteensä viittä erilaista koneoppimisalgoritmia ja verrataan niiden suorituskykyjä. Nämä algoritmit ovat tukivektorikone, logistinen regressio, päätöspuu, satunnainen metsä ja naïve Bayes-luokitin. Aihemallinnus toteutetaan latentin semanttisen analyysin avulla. Hajek ym. (2016) ja Chen ym. (2017) tutkimukset ovat toimineet pääasiallisena inspiraation lähteenä tälle tutkielmalle. Tässä tutkielmassa käytetyt metodit ovat pitkälti samoja kuin näissä tutkimuksissa. Tämä tutkielma tuo lisäarvoa näiden tutkimusten tuloksiin selvittämällä, kuinka Hajek ym. (2016) käyttämiä luottoluokituksen ennustusmetodeja, Chen ym. (2017) käyttämiä binäärisen luokittelun metodeja ja 10-k vuosikertomusten hyödyntämistä (käytetty sekä Hajek ym. (2016) että Chen ym. (2017)) voidaan yhdistää binääriseksi luottoluokitusennustimeksi. Tutkielman tulokset osoittavat, että luottoluokituksen muutosta voidaan ennustaa käyttämällä 10-k vuosikertomuksia, mutta ennusteet eivät ole kovin tarkkoja. Paras luokittelutulos saatiin tukivektorikoneella, tarkkuudella 69,4% ja AUC-arvolla 0,6744. Aihemallinnuksella ei saavutettu merkittävää parannusta luokittelutuloksiin

    Predicting Credit Ratings using Deep Learning Models – An Analysis of the Indian IT Industry

    Get PDF
    Due to the complexity of transactions and the availability of Big Data, many banks and financial institutions are reviewing their business models. Various tasks get involved in determining the credit worthiness like working with spreadsheets, manually gathering data from customers and corporations, etc. In this research paper, we aim to automate and analyze the credit ratings of the Information and technology industry in India. Various Deep-Learning models are incorporated to predict the credit rankings from highest to lowest separately for each company to find the best fit model. Factors like Share Capital, Depreciation & Amortisation, Intangible Assets, Operating Margin, inventory valuation, etc., are the parameters that contribute to the credit rating predictions. The data collected for the study spans between the years FY-2015 to FY-2020. As per the research been carried out with efficiencies of different Deep Learning models been tested and compared, MLP gained the highest efficiency for predicting the same. This research contributes to identifying how we can predict the ratings for several IT companies in India based on their Financial risk, Business risk, Industrial risk, and Macroeconomic environment using various neural network models for better accuracy. Also it helps us understand the significance of Artificial Neural Networks in credit rating predictions using unstructured and real time Financial data consisting the influence of COVID-19 in Indian IT industry

    An academic review: applications of data mining techniques in finance industry

    Get PDF
    With the development of Internet techniques, data volumes are doubling every two years, faster than predicted by Moore’s Law. Big Data Analytics becomes particularly important for enterprise business. Modern computational technologies will provide effective tools to help understand hugely accumulated data and leverage this information to get insights into the finance industry. In order to get actionable insights into the business, data has become most valuable asset of financial organisations, as there are no physical products in finance industry to manufacture. This is where data mining techniques come to their rescue by allowing access to the right information at the right time. These techniques are used by the finance industry in various areas such as fraud detection, intelligent forecasting, credit rating, loan management, customer profiling, money laundering, marketing and prediction of price movements to name a few. This work aims to survey the research on data mining techniques applied to the finance industry from 2010 to 2015.The review finds that Stock prediction and Credit rating have received most attention of researchers, compared to Loan prediction, Money Laundering and Time Series prediction. Due to the dynamics, uncertainty and variety of data, nonlinear mapping techniques have been deeply studied than linear techniques. Also it has been proved that hybrid methods are more accurate in prediction, closely followed by Neural Network technique. This survey could provide a clue of applications of data mining techniques for finance industry, and a summary of methodologies for researchers in this area. Especially, it could provide a good vision of Data Mining Techniques in computational finance for beginners who want to work in the field of computational finance

    Artificial Intelligence & Machine Learning in Finance: A literature review

    Get PDF
    In the 2020s, Artificial Intelligence (AI) has been increasingly becoming a dominant technology, and thanks to new computer technologies, Machine Learning (ML) has also experienced remarkable growth in recent years; however, Artificial Intelligence (AI) needs notable data scientist and engineers’ innovation to evolve. Hence, in this paper, we aim to infer the intellectual development of AI and ML in finance research, adopting a scoping review combined with an embedded review to pursue and scrutinize the services of these concepts. For a technical literature review, we goose-step the five stages of the scoping review methodology along with Donthu et al.’s (2021) bibliometric review method. This article highlights the trends in AI and ML applications (from 1989 to 2022) in the financial field of both developed and emerging countries. The main purpose is to emphasize the minutiae of several types of research that elucidate the employment of AI and ML in finance. The findings of our study are summarized and developed into seven fields: (1) Portfolio Management and Robo-Advisory, (2) Risk Management and Financial Distress (3), Financial Fraud Detection and Anti-money laundering, (4) Sentiment Analysis and Investor Behaviour, (5) Algorithmic Stock Market Prediction and High-frequency Trading, (6) Data Protection and Cybersecurity, (7) Big Data Analytics, Blockchain, FinTech. Further, we demonstrate in each field, how research in AI and ML enhances the current financial sector, as well as their contribution in terms of possibilities and solutions for myriad financial institutions and organizations. We conclude with a global map review of 110 documents per the seven fields of AI and ML application.   Keywords: Artificial Intelligence, Machine Learning, Finance, Scoping review, Casablanca Exchange Market. JEL Classification: C80 Paper type: Theoretical ResearchIn the 2020s, Artificial Intelligence (AI) has been increasingly becoming a dominant technology, and thanks to new computer technologies, Machine Learning (ML) has also experienced remarkable growth in recent years; however, Artificial Intelligence (AI) needs notable data scientist and engineers’ innovation to evolve. Hence, in this paper, we aim to infer the intellectual development of AI and ML in finance research, adopting a scoping review combined with an embedded review to pursue and scrutinize the services of these concepts. For a technical literature review, we goose-step the five stages of the scoping review methodology along with Donthu et al.’s (2021) bibliometric review method. This article highlights the trends in AI and ML applications (from 1989 to 2022) in the financial field of both developed and emerging countries. The main purpose is to emphasize the minutiae of several types of research that elucidate the employment of AI and ML in finance. The findings of our study are summarized and developed into seven fields: (1) Portfolio Management and Robo-Advisory, (2) Risk Management and Financial Distress (3), Financial Fraud Detection and Anti-money laundering, (4) Sentiment Analysis and Investor Behaviour, (5) Algorithmic Stock Market Prediction and High-frequency Trading, (6) Data Protection and Cybersecurity, (7) Big Data Analytics, Blockchain, FinTech. Further, we demonstrate in each field, how research in AI and ML enhances the current financial sector, as well as their contribution in terms of possibilities and solutions for myriad financial institutions and organizations. We conclude with a global map review of 110 documents per the seven fields of AI and ML application.   Keywords: Artificial Intelligence, Machine Learning, Finance, Scoping review, Casablanca Exchange Market. JEL Classification: C80 Paper type: Theoretical Researc

    Early Warning System for Bankruptcy: Bankruptcy Prediction

    Get PDF
    The recent bankruptcies of large joint stock companies in U.S. and Europe shook investors across the world and underlined the importance of failure prediction both in academia and industry. It now seems more necessary than ever to develop early warning systems that can help to prevent or avert corporate default. These systems facilitate the selection of firms to collaborate with or invest in

    Decision Support Systems for Risk Assessment in Credit Operations Against Collateral

    Get PDF
    With the global economic crisis, which reached its peak in the second half of 2008, and before a market shaken by economic instability, financial institutions have taken steps to protect the banks’ default risks, which had an impact directly in the form of analysis in credit institutions to individuals and to corporate entities. To mitigate the risk of banks in credit operations, most banks use a graded scale of customer risk, which determines the provision that banks must do according to the default risk levels in each credit transaction. The credit analysis involves the ability to make a credit decision inside a scenario of uncertainty and constant changes and incomplete transformations. This ability depends on the capacity to logically analyze situations, often complex and reach a clear conclusion, practical and practicable to implement. Credit Scoring models are used to predict the probability of a customer proposing to credit to become in default at any given time, based on his personal and financial information that may influence the ability of the client to pay the debt. This estimated probability, called the score, is an estimate of the risk of default of a customer in a given period. This increased concern has been in no small part caused by the weaknesses of existing risk management techniques that have been revealed by the recent financial crisis and the growing demand for consumer credit.The constant change affects several banking sections because it prevents the ability to investigate the data that is produced and stored in computers that are too often dependent on manual techniques. Among the many alternatives used in the world to balance this risk, the provision of guarantees stands out of guarantees in the formalization of credit agreements. In theory, the collateral does not ensure the credit return, as it is not computed as payment of the obligation within the project. There is also the fact that it will only be successful if triggered, which involves the legal area of the banking institution. The truth is, collateral is a mitigating element of credit risk. Collaterals are divided into two types, an individual guarantee (sponsor) and the asset guarantee (fiduciary). Both aim to increase security in credit operations, as an payment alternative to the holder of credit provided to the lender, if possible, unable to meet its obligations on time. For the creditor, it generates liquidity security from the receiving operation. The measurement of credit recoverability is a system that evaluates the efficiency of the collateral invested return mechanism. In an attempt to identify the sufficiency of collateral in credit operations, this thesis presents an assessment of smart classifiers that uses contextual information to assess whether collaterals provide for the recovery of credit granted in the decision-making process before the credit transaction become insolvent. The results observed when compared with other approaches in the literature and the comparative analysis of the most relevant artificial intelligence solutions, considering the classifiers that use guarantees as a parameter to calculate the risk contribute to the advance of the state of the art advance, increasing the commitment to the financial institutions.Com a crise econômica global, que atingiu seu auge no segundo semestre de 2008, e diante de um mercado abalado pela instabilidade econômica, as instituições financeiras tomaram medidas para proteger os riscos de inadimplência dos bancos, medidas que impactavam diretamente na forma de análise nas instituições de crédito para pessoas físicas e jurídicas. Para mitigar o risco dos bancos nas operações de crédito, a maioria destas instituições utiliza uma escala graduada de risco do cliente, que determina a provisão que os bancos devem fazer de acordo com os níveis de risco padrão em cada transação de crédito. A análise de crédito envolve a capacidade de tomar uma decisão de crédito dentro de um cenário de incerteza e mudanças constantes e transformações incompletas. Essa aptidão depende da capacidade de analisar situações lógicas, geralmente complexas e de chegar a uma conclusão clara, prática e praticável de implementar. Os modelos de Credit Score são usados para prever a probabilidade de um cliente propor crédito e tornar-se inadimplente a qualquer momento, com base em suas informações pessoais e financeiras que podem influenciar a capacidade do cliente de pagar a dívida. Essa probabilidade estimada, denominada pontuação, é uma estimativa do risco de inadimplência de um cliente em um determinado período. A mudança constante afeta várias seções bancárias, pois impede a capacidade de investigar os dados que são produzidos e armazenados em computadores que frequentemente dependem de técnicas manuais. Entre as inúmeras alternativas utilizadas no mundo para equilibrar esse risco, destacase o aporte de garantias na formalização dos contratos de crédito. Em tese, a garantia não “garante” o retorno do crédito, já que não é computada como pagamento da obrigação dentro do projeto. Tem-se ainda, o fato de que esta só terá algum êxito se acionada, o que envolve a área jurídica da instituição bancária. A verdade é que, a garantia é um elemento mitigador do risco de crédito. As garantias são divididas em dois tipos, uma garantia individual (patrocinadora) e a garantia do ativo (fiduciário). Ambos visam aumentar a segurança nas operações de crédito, como uma alternativa de pagamento ao titular do crédito fornecido ao credor, se possível, não puder cumprir suas obrigações no prazo. Para o credor, gera segurança de liquidez a partir da operação de recebimento. A mensuração da recuperabilidade do crédito é uma sistemática que avalia a eficiência do mecanismo de retorno do capital investido em garantias. Para tentar identificar a suficiência das garantias nas operações de crédito, esta tese apresenta uma avaliação dos classificadores inteligentes que utiliza informações contextuais para avaliar se as garantias permitem prever a recuperação de crédito concedido no processo de tomada de decisão antes que a operação de crédito entre em default. Os resultados observados quando comparados com outras abordagens existentes na literatura e a análise comparativa das soluções de inteligência artificial mais relevantes, mostram que os classificadores que usam garantias como parâmetro para calcular o risco contribuem para o avanço do estado da arte, aumentando o comprometimento com as instituições financeiras

    Machine learning methods in finance: Recent applications and prospects

    Get PDF
    We study how researchers can apply machine learning (ML) methods in finance. We first establish that the two major categories of ML (supervised and unsupervised learning) address fundamentally different problems than traditional econometric approaches. Then, we review the current state of research on ML in finance and identify three archetypes of applications: (i) the construction of superior and novel measures, (ii) the reduction of prediction error, and (iii) the extension of the standard econometric toolset. With this taxonomy, we give an outlook on potential future directions for both researchers and practitioners. Our results suggest many benefits of ML methods compared to traditional approaches and indicate that ML holds great potential for future research in finance

    Textual Analysis of Intangible Information

    Get PDF
    Traditionally, equity investors have relied upon the information reported in firms’ financial accounts to make their investment decisions. Due to the conservative nature of accounting standards, firms cannot value their intangible assets such as corporate culture, brand value and reputation. Investors’ efforts to collect such information have been hampered by the voluntary nature of Corporate Social Responsibility (CSR) reporting standards, which have resulted in the publication of inconsistent, stale and incomplete information across firms. In short, information on intangible assets is less salient to investors compared to accounting information because it is more costly to collect, process and analyse. In this thesis we design an automated approach to collect and quantify information on firms’ intangible assets by drawing upon techniques commonly adopted in the fields of Natural Language Processing (NLP) and Information Retrieval. The exploitation of unstructured data available on the Web holds promise for investors seeking to integrate a wider variety of information into their investment processes. The objectives of this research are: 1) to draw upon textual analysis methodologies to measure intangible information from a range of unstructured data sources, 2) to integrate intangible information and accounting information into an investment analysis framework, 3) evaluate the merits of unstructured data for the prediction of firms’ future earnings

    Machine learning applied to active fixed-income portfolio management: a Lasso logit approach

    Get PDF
    El uso de métodos cuantitativos es fundamental en la gestión de carteras de inversores institucionales. En la última década, se han realizado diversos estudios empíricos que emplean modelos probabilísticos o de clasificación para predecir los rendimientos del mercado de valores, modelar calificaciones de riesgo y probabilidades de incumplimiento de bonos, así como pronosticar la curva de rendimientos. Sin embargo, existe una escasa investigación sobre la aplicación de estos modelos en la gestión activa de renta fija. Este documento busca abordar esta brecha al comparar un algoritmo de aprendizaje automático, la regresión logística Lasso, con una estrategia de inversión pasiva (comprar y mantener) en la construcción de un modelo de gestión de duración para carteras de bonos gubernamentales, con enfoque específico en los bonos del Tesoro de Estados Unidos. Además, se propone un procedimiento de dos pasos, junto con un promedio simple entre variables de características estadísticas similares, con el objetivo de minimizar el posible sobreajuste de los algoritmos tradicionales de aprendizaje automático. Asimismo, se introduce un método para seleccionar umbrales que conviertan probabilidades en señales basadas en distribuciones de probabilidad condicional. Se utiliza un amplio conjunto de variables financieras y económicas para obtener una señal de duración y se comparan otras estrategias de inversión. Como resultado, la mayoría de las variables seleccionadas por el modelo están relacionadas con flujos financieros y fundamentos económicos, aunque los parámetros no parecen ser estables a lo largo del tiempo, lo que sugiere que la relevancia de las variables es dinámica y se requiere una evaluación continua del modelo. Además, el modelo logra un exceso de retorno estadísticamente significativo en comparación con la estrategia pasiva. Estos resultados respaldan la inclusión de herramientas cuantitativas en el proceso de gestión activa de carteras para inversores institucionales, con especial atención en el posible sobreajuste y en los parámetros inestables. Las herramientas cuantitativas deben considerarse como un complemento del análisis cualitativo y fundamental, junto con la experiencia del gestor de carteras, para tomar decisiones de inversión fundamentadas de manera más sólida.The use of quantitative methods constitutes a standard component of the institutional investors’ portfolio management toolkit. In the last decade, several empirical studies have employed probabilistic or classification models to predict stock market excess returns, model bond ratings and default probabilities, as well as to forecast yield curves. To the authors’ knowledge, little research exists into their application to active fixed-income management. This paper contributes to filling this gap by comparing a machine learning algorithm, the Lasso logit regression, with a passive (buy-and-hold) investment strategy in the construction of a duration management model for high-grade bond portfolios, specifically focusing on US treasury bonds. Additionally, a two-step procedure is proposed, together with a simple ensemble averaging aimed at minimising the potential overfitting of traditional machine learning algorithms. A method to select thresholds that translate probabilities into signals based on conditional probability distributions is also introduced
    corecore