3,701 research outputs found

    Ensembles of Text and Time-Series Models for Automatic Generation of Financial Trading Signals

    Get PDF
    Event Studies in Finance have focused on traditional news headlines to assess the impact an event has on a traded company. The increased proliferation of news and information produced by social media content has disrupted this trend. Although researchers have begun to identify trading opportunities from social media platforms, such as Twitter, almost all techniques use a general sentiment from large collections of tweets. Though useful, general sentiment does not provide an opportunity to indicate specific events worthy of affecting stock prices. This work presents an event clustering algorithm, utilizing natural language processing techniques to generate newsworthy events from Twitter, which have the potential to influence stock prices in the same manner as traditional news headlines. The event clustering method addresses the effects of pre-news and lagged-news, two peculiarities that appear when connecting trading and news, regardless of the medium. Pre-news signifies a finding where stock prices move in advance of a news release. Lagged-news refers to follow-up or late-arriving news, adding redundancy in making trading decisions. For events generated by the proposed clustering algorithm, we have designed and implemented novel language and time-series techniques -- incorporating Event Studies and Machine Learning to produce an actionable system that can guide trading decisions. Of the various methods considered, the emphasis was particularly on the state-of-the-art established methods versus modern Deep Learning techniques. The recommended prediction algorithms provide investing strategies with profitable risk-adjusted returns. The suggested language models present Annualized Sharpe Ratios (risk-adjusted returns) in the 5 to 11 range, while time-series models produce in the 2 to 3 range (without transaction costs). A close investigation of the distribution of returns confirms the encouraging Sharpe Ratios by identifying most outliers as significant positive gains. Additionally, Machine Learning metrics of precision, recall, and accuracy are discussed alongside financial metrics in hopes of bridging the gap between academia and industry in the field of Computational Finance

    Equity sector rebalancing via machine learning

    Get PDF
    In this dissertation the author will analyze whether supervised machine learning models namely Artificial Neural Networks, Support Vector Machines and Logistic Regressions can predict shifts in equity returns on a sector basis. Typically, in asset pricing linear factor models with a small number of variables are used. However, due to market efficiency, equity returns are highly influenced by unforecastable events making this task more challenging. Simple linear regressions also have difficulty incorporating a larger number of predictor variables, which the literature has accumulated over the decades, creating an opportunity for machine learning techniques. The Machine Learning models will be used to forecast whether the excess return of each equity sector over a period of one month will be positive or negative. Then using the model’s predictions capital will be allocated between the sectors and treasury bonds, building different portfolios namely an equal weighted, a value weighted portfolio. After all portfolios are built their performance will then be compared against the benchmark, namely the S&P500 index being back tested over a period of 25 years. The portfolios built using the forecasts from the ML models lead to an increase in absolute and risk­-adjusted returns beating the benchmark. The implemented strategies were shown to protect investors against larger market declines, showing the potential of Machine Learning as an investment tool.Nesta dissertação o autor vai analisar se os modelos de machine learning nomeadamente Redes Neurais Artificiais, Máquina de Vetores de Suporte e Regressões Logísticas, conseguem prever mudanças nos retornos dos vários setores de mercado. Tipicamente, na definição do preço de ativos, são usados modelos de fatores lineares com um pequeno número de variáveis. Contudo, devido á eficiência do mercado, os retornos de ações são influenciados por eventos imprevisíveis aumentando a complexidade do problema. As regressões lineares simples têm dificuldade em incorporar um número vasto de variáveis, que a literatura Financeira veio a acumular ao longo das décadas, criando uma oportunidade para técnicas de machine learning. Os modelos de Machine Learning serão utilizados para realizar previsões sobre se o retorno de excesso sobre o período de um mês será positivo ou negativo. Utilizando as previsões dos modelos, o capital será alocado entre os vários setores do mercado e obrigações de tesouraria, construindo diferentes portfolios. Estando os portfolios construídos a respetiva performance será avaliada e comparada contra o benchmark, nomeadamente o índice do S&P500, durante um período de 25 anos. Os portfolios contruídos usando as previsões dos modelos de ML levaram a um aumento de retornos absolutos e ajustados ao risco batendo o benchmark. As estratégias teriam protegido investidores contra quedas acentuadas do mercado, mostrando o potencial de Machine Learning como ferramenta de investimento

    Textual analysis in finance

    Get PDF
    Tese (doutorado)—Universidade de Brasília, Departamento de Economia, Brasília, 2019.Esta tese é composta por três estudos que têm como objetivo estudar o impacto da mídia escrita no mercado acionário. No primeiro estudo, fazemos uma pesquisa acerca dos trabalhos que utilizam análise textual para quantificar variáveis econômicas e resumimos os principais resultados dos estudos que investigam seu impacto no mercado acionário. Como o uso de textos como dados em pesquisas científicas é um campo que está em crescimento, este estudo tem como objetivo sintetizar os principais resultados para delinear onde está a fronteira do conhecimento na literatura de finanças. Os dois estudos restantes investigam a relação entre duas variáveis estimadas a partir de notícias e o mercado acionário brasileiro. Assim, no segundo estudo que compõe esta tese estudamos o impacto da incerteza econômica nos retornos acionários semanais. Neste estudo, propomos um novo método para estimar incerteza econômica a partir de notícias usando vetores de palavras para representar o vocabulário. Encontramos um efeito significativo da nossa medida de incerteza econômica na precificação das aç ões e mostramos que medidas de incerteza propostas na literatura mensuradas a partir de notícias geram um efeito similar. No terceiro estudo, estimamos corrupção a partir de notícias e analisamos sua relação com o desempenho de ações de duas empresas que estiveram envolvidas em escândalos de corrupção nos últimos anos. Este estudo tem como objetivo quantificar o custo da corrupção para essas empresas. O impacto da corrupção abordada nas notícias nos retornos acionários divergem entre as empresas. No caso em que a empresa possui controle privado, a corrupção nas notícias impactam negativamente os retornos acionários. Para o caso em que a empresa possui controle estatal, o efeito é insignificante. Encontramos, ainda, um efeito de longo prazo dos escândalos de corrupção nos preços das ações.Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES).This thesis is composed of three studies that aim to investigate the impact of written media on stock performance. In the first study, we make a survey of the literature that uses textual analysis to quantify economic variables and review the main results of the studies that examine its effect on the stock market. Since the use of texts as data in scientific research is a growing field, this study aims to summarize the main findings to draw where the frontier knowledge in finance literature is. The remaining two studies investigate the relation between two variables estimated from news stories and the Brazilian stock market. Thereby, in the second study, we investigate the impact of economic uncertainty on weekly stock returns. We propose a new method to estimate economic uncertainty from news stories using word vectors for word representation. We find that there is a significant effect of our economic uncertainty measure on pricing individual stocks and provide similar evidence with uncertainty measures from news stories proposed in the literature. In the third study, we estimate corruption from news stories and investigate its relation to the stock performance of two firms that were involved in corruption scandals in the latest years with the primary goal of estimating the cost of corruption for the firms. The impact of the corruption reported in the news stories on the stock returns diverges between companies. In the case the company has private ownership, corruption in news negatively impacts stock returns. For the state-owned company, the effect is insignificant. We also find a long-term effect of the corruption scandals in the stock prices

    Developing a Machine Learning based Systematic Investment Startegy: A case study for the Construction Industry

    Get PDF
    In this research work, an end-to-end systematic investment strategy based on machine learning models and leveraging the construction industry operational and management practices knowledge, is implemented. First, a literature research in the field of behavioral finance is done, presenting the current state of the knowledge and trends in the industry. A suitable investment opportunity exploiting prevailing market inefficiencies around earnings announcements is identified. Second, an extensive literature research is performed identifying the most relevant characteristics of construction companies’ operations and major risk factors they are exposed to. These insights are used to engineer a set of relevant variables. Third, advanced statistical techniques are used to select the most relevant subset of features, which includes market and analysts’ expectation data, macroeconomic indicators, the delay in reporting earnings, and the most important financial dimensions for construction firms. Fourth, the earnings’ surprise classification problem is characterized by a class imbalance and asymmetric misclassification costs. These issues are a consequence of the desired business application, and are addressed by selecting an appropriate evaluation metric. Additionally, considerations on the temporal dimension and generative process of the data are made to select an appropriate validation scheme. Five different state-of-the-art machine learning algorithms are considered: a multinomial logistic regression, a bagging classifier, a random forest, an XGBoost and a linear Support Vector Machine. The multinomial logistic regression is found to be the most suitable model, exhibiting a bias towards predicting positive earnings’ surprises over the rest of classes. The firm size, and the profitability and valuation measures, portrayed by the Return on Assets and Enterprise Value multiples, are found to be the most important variables when predicting earnings surprises. To conclude, the systematic investment strategy based on the investment signals produced by the selected machine learning model is back-tested, being the performance of the long-short portfolio driven by the positive surprise one as a consequence of the selected model bias. Keywords: Quantitative Investing, Machine Learning, Behavioral Financ

    News and stock markets: A survey on abnormal returns and prediction models

    Get PDF
    Vast amount of news articles are published daily reflecting global topics. The stories represent information about events and expert opinions, which may trigger positive or negative expectations on the stock markets. The literature describes various methods for analyzing such correlations. In this paper we consider related approaches for tracking the impact of news on abnormal stock returns. In the first part we introduce studies with back- ground in Finance. Primarily by applying statistical functions the works examine unusual price volatilities and explore possible sources and market conditions, e.g. biased investors, limited attention, macro-economic variables, country development state, et cetera. In the second part we present studies with background in Computer Science, which take advan- tage of historic news and the equivalent market values. By following the common learning paradigm the projects elaborate prototypes for trend and stock price prediction. In the current survey we evaluate leading approaches regarding the objectives, assumptions, in- put, techniques, and performance. Moreover we provide a comparison framework of the recent prototypes and identify gaps for future research

    Corporate disclosure and investor sentiment

    Get PDF
    Investor decision-making relies on accurate, timely information, but market efficiency theories and information asymmetry challenge the ability to outperform the market. Disclosure practices, including the emerging importance of ESG information, play a key role in shaping investor sentiment and market dynamics. Digital platforms, especially social media, have transformed how information is disseminated, influencing corporate disclosure practices and investor reactions. Behavioral finance highlights how psychological factors can lead to deviations from fundamental valuations. The COVID-19 pandemic underscored the impact of external shocks on investor sentiment, with government interventions playing a critical role in market responses. Auditing, adapting to include non-financial information like ESG, remains essential in reducing information asymmetry, with modern data-analytic tools like Process Mining offering potential to enhance audit efficiency. This dissertation explores the interplay between financial reporting, news dissemination, auditing practices, and their collective impact on the capital market, highlighting the importance of active corporate communication, government response to crises, and the evolving role of auditing in maintaining market trust

    Adaptive sentiment analysis

    Get PDF
    Domain dependency is one of the most challenging problems in the field of sentiment analysis. Although most sentiment analysis methods have decent performance if they are targeted at a specific domain and writing style, they do not usually work well with texts that are originated outside of their domain boundaries. Often there is a need to perform sentiment analysis in a domain where no labelled document is available. To address this scenario, researchers have proposed many domain adaptation or unsupervised sentiment analysis methods. However, there is still much room for improvement, as those methods typically cannot match conventional supervised sentiment analysis methods. In this thesis, we propose a novel aspect-level sentiment analysis method that seamlessly integrates lexicon- and learning-based methods. While its performance is comparable to existing approaches, it is less sensitive to domain boundaries and can be applied to cross-domain sentiment analysis when the target domain is similar to the source domain. It also offers more structured and readable results by detecting individual topic aspects and determining their sentiment strengths. Furthermore, we investigate a novel approach to automatically constructing domain-specific sentiment lexicons based on distributed word representations (aka word embeddings). The induced lexicon has quality on a par with a handcrafted one and could be used directly in a lexiconbased algorithm for sentiment analysis, but we find that a two-stage bootstrapping strategy could further boost the sentiment classification performance. Compared to existing methods, such an end-to-end nearly-unsupervised approach to domain-specific sentiment analysis works out of the box for any target domain, requires no handcrafted lexicon or labelled corpus, and achieves sentiment classification accuracy comparable to that of fully supervised approaches. Overall, the contribution of this Ph.D. work to the research field of sentiment analysis is twofold. First, we develop a new sentiment analysis system which can — in a nearlyunsupervised manner—adapt to the domain at hand and perform sentiment analysis with minimal loss of performance. Second, we showcase this system in several areas (including finance, politics, and e-business), and investigate particularly the temporal dynamics of sentiment in such contexts
    corecore