1,328 research outputs found

    Econometrics meets sentiment : an overview of methodology and applications

    Get PDF
    The advent of massive amounts of textual, audio, and visual data has spurred the development of econometric methodology to transform qualitative sentiment data into quantitative sentiment variables, and to use those variables in an econometric analysis of the relationships between sentiment and other variables. We survey this emerging research field and refer to it as sentometrics, which is a portmanteau of sentiment and econometrics. We provide a synthesis of the relevant methodological approaches, illustrate with empirical results, and discuss useful software

    The Informativeness of Text, the Deep Learning Approach

    Get PDF
    This paper uses a deep learning natural language processing approach (Google's Bidirectional Encoder Representations from Transformers, hereafter BERT) to comprehensively summarize financial texts and examine their informativeness. First, we compare BERT's effectiveness in sentiment classification in financial texts with that of a finance specific dictionary, the naïve Bayes, and Word2Vec, a shallow machine learning approach. We find that first, BERT outperforms all other approaches, and second, pre-training BERT with financial texts further improves its performance. Using BERT, we show that conference call texts provide information to investors and that other less accurate approaches underestimate the economic significance of textual informativeness by at least 25%. Last, textual sentiments summarized by BERT can predict future earnings and capital expenditure, after controlling for financial statement based determinants commonly used in finance and accounting research

    Textual analysis in finance

    Get PDF
    Tese (doutorado)—Universidade de Brasília, Departamento de Economia, Brasília, 2019.Esta tese é composta por três estudos que têm como objetivo estudar o impacto da mídia escrita no mercado acionário. No primeiro estudo, fazemos uma pesquisa acerca dos trabalhos que utilizam análise textual para quantificar variáveis econômicas e resumimos os principais resultados dos estudos que investigam seu impacto no mercado acionário. Como o uso de textos como dados em pesquisas científicas é um campo que está em crescimento, este estudo tem como objetivo sintetizar os principais resultados para delinear onde está a fronteira do conhecimento na literatura de finanças. Os dois estudos restantes investigam a relação entre duas variáveis estimadas a partir de notícias e o mercado acionário brasileiro. Assim, no segundo estudo que compõe esta tese estudamos o impacto da incerteza econômica nos retornos acionários semanais. Neste estudo, propomos um novo método para estimar incerteza econômica a partir de notícias usando vetores de palavras para representar o vocabulário. Encontramos um efeito significativo da nossa medida de incerteza econômica na precificação das aç ões e mostramos que medidas de incerteza propostas na literatura mensuradas a partir de notícias geram um efeito similar. No terceiro estudo, estimamos corrupção a partir de notícias e analisamos sua relação com o desempenho de ações de duas empresas que estiveram envolvidas em escândalos de corrupção nos últimos anos. Este estudo tem como objetivo quantificar o custo da corrupção para essas empresas. O impacto da corrupção abordada nas notícias nos retornos acionários divergem entre as empresas. No caso em que a empresa possui controle privado, a corrupção nas notícias impactam negativamente os retornos acionários. Para o caso em que a empresa possui controle estatal, o efeito é insignificante. Encontramos, ainda, um efeito de longo prazo dos escândalos de corrupção nos preços das ações.Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES).This thesis is composed of three studies that aim to investigate the impact of written media on stock performance. In the first study, we make a survey of the literature that uses textual analysis to quantify economic variables and review the main results of the studies that examine its effect on the stock market. Since the use of texts as data in scientific research is a growing field, this study aims to summarize the main findings to draw where the frontier knowledge in finance literature is. The remaining two studies investigate the relation between two variables estimated from news stories and the Brazilian stock market. Thereby, in the second study, we investigate the impact of economic uncertainty on weekly stock returns. We propose a new method to estimate economic uncertainty from news stories using word vectors for word representation. We find that there is a significant effect of our economic uncertainty measure on pricing individual stocks and provide similar evidence with uncertainty measures from news stories proposed in the literature. In the third study, we estimate corruption from news stories and investigate its relation to the stock performance of two firms that were involved in corruption scandals in the latest years with the primary goal of estimating the cost of corruption for the firms. The impact of the corruption reported in the news stories on the stock returns diverges between companies. In the case the company has private ownership, corruption in news negatively impacts stock returns. For the state-owned company, the effect is insignificant. We also find a long-term effect of the corruption scandals in the stock prices

    Machine learning methods in finance: Recent applications and prospects

    Get PDF
    We study how researchers can apply machine learning (ML) methods in finance. We first establish that the two major categories of ML (supervised and unsupervised learning) address fundamentally different problems than traditional econometric approaches. Then, we review the current state of research on ML in finance and identify three archetypes of applications: (i) the construction of superior and novel measures, (ii) the reduction of prediction error, and (iii) the extension of the standard econometric toolset. With this taxonomy, we give an outlook on potential future directions for both researchers and practitioners. Our results suggest many benefits of ML methods compared to traditional approaches and indicate that ML holds great potential for future research in finance

    Appearance of Corporate Innovation in Financial Reports : A Text-Based Analysis

    Get PDF
    Innovations are important drivers of economic growth and firm profitability. Firms need funding to generate profitable innovations, which is why it is important to reliably distinguish innovative firms. Innovation indicators are used to measure this innovativeness, and consequently, it is important that the used indicator is reliable and measures innovation as desired. Patents, research and development expenditure and innovation surveys are examples of popular innovation indicators in research literature. However, these indicators have weaknesses, which is why new innovation indicators have been developed. This thesis studies the text-based innovation indicator developed by Bellstam et al. (2019) with a new type of data. Bellstam et al. (2019) created a new text-based innovation indicator that compares corporations’ analyst reports with an innovation textbook as the basis for the indicator. The similarity between these texts created the measurement for innovativeness. Analyst reports are usu-ally subject to charge. However, the 10-K reports used as data for this study are publicly available, and their functionality as the basis of the innovation indicator would mean good availability for the indicator. The study begins by training a Latent Dirichlet allocation (LDA) model with a sample of 10-K documents from 2008-2018. LDA-model is an unsupervised machine learning method, it finds topics in the text documents based on the probabilities of different words. The LDA-model was trained to find 15 topic allocations in the data and the output of the model is the distribution of these topics for each document. The same topic distributions were also allocated for eight samples from innovation textbooks. When the topic distributions were allocated, a Kullback-Leibler-divergence (KL-divergence) was calculated between each text sample and 10-K document. Thus, the KL-divergence calculated is the lowest for those reports that are the most similar to the innovation text and works as the text-based innovation indicator. Finally, the text-based innovation indicator was validated with regression analysis, in other words, it was confirmed that the indicator measures innovation. The text-based indicator was compared with research and development costs and the balance sheet value of brands and patents in different linear regressions. Out of the eight innovation measurements, most had a statistically significant correlation with one or both of the other innovation indicators. The ability of the text-based indicator to predict the development of sales in the next year was studied with regression analysis as well and all of the measurements had a significant effect on this. The most significant findings of this thesis are the relationship of the text-based innovation indicator and other indicators and its ability to predict firms’ sales.Innovaatiot ovat tärkeitä talouskasvun ja yritysten kannattavuuden ajureita. Tuottavien innovaatioiden syntymiseksi yritykset tarvitsevat rahoitusta, minkä takia onkin tärkeää, että innovatiiviset yritykset pystytään tunnistamaan luotettavasti. Innovaatioindikaattoreita käytetään tähän innovatiivisuuden mittaamiseen ja on siksi tärkeää, että käytetty indikaattori on luotettava ja mittaa innovatiivisuutta oikealla tavalla. Kirjallisuudessa paljon käytettyjä innovaatioindikaattoreita ovat esimerkiksi patentit, tutkimus- ja kehitysmenot sekä innovaatiokyselyt. Näissä indikaattoreissa on kuitenkin myös heikkouksia, joiden takia uusia indikaattoreita on alettu kehittää. Tässä tutkielmassa tutkitaan Bellstamin ja muiden (2019) luomaa tekstipohjaista innovaatioindikaattoria erilaisella datalla. Bellstam ja muut (2019) loivat uuden innovaatioindikaattorin, jonka pohjana oli yritysten ana-lyytikkoraporttien vertailu innovaatio-oppikirjan tekstin kanssa, näiden samankaltaisuusver-tailusta saatiin innovaatiomittari. Analyytikkoraportit ovat usein maksullisia. Tässä tutkimuk-sessa aineistona on käytetty lakisääteisiä tilinpäätösraportteja, jotka ovat julkisia tiedostoja, joten niiden toimivuus innovaatioindikaattorin pohjana tarkoittaisi hyvää saatavuutta indi-kaattorille. Tutkimus alkaa Latent Dirichlet allocation (LDA) –mallin harjoittamisella Yhdysvaltalaisten yritysten 10-K, eli tilinpäätösraporteilla vuosilta 2008-2018. LDA-malli on valvomaton koneoppimismenetelmä, eli se etsii datasta itse aihepiirejä sanojen todennäköisyyksien perusteella. LDA-malli asetettiin etsimään datasta 15 eri aihepiiriä raporteissa käytettyjen aiheiden perusteella ja mallin tuloksena on näiden aihepiirien jakautuminen jokaisessa dokumentissa. Samat aihepiirijakaumat haettiin myös kahdeksalle tekstiotokselle innovaatio-oppikirjoista. Aihepiirijakaumien ollessa valmiit, laskettiin Kullback-Leibler-divergenssi (KL-divergenssi) tilinpäätösraporttien ja innovaatio-oppikirjojen tekstiotosten aihepiirijakaumien välille. Laskettu KL-divergenssi on siten matalin niille tilinpäätösraporteille, joiden teksti on lähimpänä kunkin innovaatio-oppikirjan tekstiä ja toimii tekstipohjaisena innovaatioindikaattorina. Lopuksi indikaattorin toimivuus vahvistetaan regressioanalyysillä, eli tutkitaan, että se mittaa innovatiivisuuta. Regressioanalyysillä tutkitaan innovaatiomittarien yhteyttä yritysten tutkimus- ja kehitystoiminnan kuluihin sekä patenttien ja brändien tasearvoon. Kahdeksasta innovaatiomittarista suurimmalla osalla oli tilastollisesti merkitsevä yhteys muuttujista toiseen tai molempiin. Myös uuden innovaatiomittarin kykyä ennustaa yritysten seuraavan vuoden myyntiä tutkittiin regressioanalyysillä ja jokaisella mittarilla oli tilastollisesti merkitsevä yhteys yritysten liikevaihdon muutokseen. Tutkimuksen merkittävin löydös oli tekstipohjaisen innovaatiomittarin yhteys muihin innovaatiomittareihin ja yritysten liikevaihdon kehitykseen

    Automated Trading Systems Statistical and Machine Learning Methods and Hardware Implementation: A Survey

    Get PDF
    Automated trading, which is also known as algorithmic trading, is a method of using a predesigned computer program to submit a large number of trading orders to an exchange. It is substantially a real-time decision-making system which is under the scope of Enterprise Information System (EIS). With the rapid development of telecommunication and computer technology, the mechanisms underlying automated trading systems have become increasingly diversified. Considerable effort has been exerted by both academia and trading firms towards mining potential factors that may generate significantly higher profits. In this paper, we review studies on trading systems built using various methods and empirically evaluate the methods by grouping them into three types: technical analyses, textual analyses and high-frequency trading. Then, we evaluate the advantages and disadvantages of each method and assess their future prospects

    Applying text timing in corporate spin-off disclosure statement analysis: understanding the main concerns and recommendation of appropriate term weights

    Get PDF
    Text mining helps in extracting knowledge and useful information from unstructured data. It detects and extracts information from mountains of documents and allowing in selecting data related to a particular data. In this study, text mining is applied to the 10-12b filings done by the companies during Corporate Spin-off. The main purposes are (1) To investigate potential and/or major concerns found from these financial statements filed for corporate spin-off and (2) To identify appropriate methods in text mining which can be used to reveal these major concerns. 10-12b filings from thirty-four companies were taken and only the Risk Factors category was taken for analysis. Term weights such as Entropy, IDF, GF-IDF, Normal and None were applied on the input data and out of them Entropy and GF-IDF were found to be the appropriate term weights which provided acceptable results. These accepted term weights gave the results which was acceptable to human expert\u27s expectations. The document distribution from these term weights created a pattern which reflected the mood or focus of the input documents. In addition to the analysis, this study also provides a pilot study for future work in predictive text mining for the analysis of similar financial documents. For example, the descriptive terms found from this study provide a set of start word list which eliminates the try and error method of framing an initial start list --Abstract, page iii
    corecore