2,727 research outputs found
Six papers on computational methods for the analysis of structured and unstructured data in the economic domain
This work investigates the application of computational methods for structured and unstructured data. The domains of application are two closely connected fields with the common
goal of promoting the stability of the financial system: systemic risk and bank supervision.
The work explores different families of models and applies them to different tasks: graphical Gaussian network models to address bank interconnectivity, topic models to monitor
bank news and deep learning for text classification. New applications and variants of these
models are investigated posing a particular attention on the combined use of textual and structured data. In the penultimate chapter is introduced a sentiment polarity classification tool in
Italian, based on deep learning, to simplify future researches relying on sentiment analysis.
The different models have proven useful for leveraging numerical (structured) and textual (unstructured) data. Graphical Gaussian Models and Topic models have been adopted
for inspection and descriptive tasks while deep learning has been applied more for predictive
(classification) problems. Overall, the integration of textual (unstructured) and numerical
(structured) information has proven useful for systemic risk and bank supervision related
analysis. The integration of textual data with numerical data in fact, has brought either to
higher predictive performances or enhanced capability of explaining phenomena and correlating them to other events.This work investigates the application of computational methods for structured and unstructured data. The domains of application are two closely connected fields with the common
goal of promoting the stability of the financial system: systemic risk and bank supervision.
The work explores different families of models and applies them to different tasks: graphical Gaussian network models to address bank interconnectivity, topic models to monitor
bank news and deep learning for text classification. New applications and variants of these
models are investigated posing a particular attention on the combined use of textual and structured data. In the penultimate chapter is introduced a sentiment polarity classification tool in
Italian, based on deep learning, to simplify future researches relying on sentiment analysis.
The different models have proven useful for leveraging numerical (structured) and textual (unstructured) data. Graphical Gaussian Models and Topic models have been adopted
for inspection and descriptive tasks while deep learning has been applied more for predictive
(classification) problems. Overall, the integration of textual (unstructured) and numerical
(structured) information has proven useful for systemic risk and bank supervision related
analysis. The integration of textual data with numerical data in fact, has brought either to
higher predictive performances or enhanced capability of explaining phenomena and correlating them to other events
A Combined Approach for Extracting Financial Instrument-Specific Investor Sentiment from Weblogs
Investor sentiment about future returns of financial instruments is a highly relevant information source for investment managers and other stakeholders in the financial industry. Investor sentiments are abundant in financial blog texts. Making use of these sentiments constitutes a massive information management challenge when considering the millions of blog articles with everchanging and growing amounts of information that need to be acquired and interpreted. We propose a novel approach for investor sentiment extraction from blogs by combining machine-learning on the document-level and knowledgebased information extraction on the sentence-level. The proposed artifact is a financial instrument-specific investor sentiment extraction method, which we apply to a set of blog articles. The evaluation suggests that the combined approach achieves a higher precision compared to a standalone knowledge-based approach
Stock Prediction Based on Social Media Data via Sentiment Analysis: a Study on Reddit
With the development of internet and information technology, online text data has become available and accessible for research in many fields including stock prediction. Social media, being one of the biggest content generators on the internet, is a great data resource for text mining and stock prediction. It has a large capacity, high data density, and fast information spread.
In this thesis, analyses on the relationship between the stock-related text in social media (Reddit) and the price changes of corresponding stocks are implemented. In the analysis, sentiment analysis is first applied to extract the individual usersâ emotions and opinions about the stocks. After that, the extracted features are analyzed via descriptive statistics and predictive analysis using the Pearson correlation coefficient and machine learning models. The predictive analysis is designed to examine the dependence between the social media text data and stock price change by evaluating the performance of predictions, four indicators are used in the evaluation including âprediction accuracy on price change directionâ and three indicators in simulated algorithm trading experiments based on prediction results. They are âtotal profit with trading strategy for single stockâ, âdaily profit efficiency of trading strategyâ and âtotal profit with Portfolio trading strategyâ. From the results and the comparison with a Buy and Hold (B&H) baseline strategy, the predictions show good results in terms of âdaily profit efficiencyâ and âtotal profit with Portfolio trading strategyâ. Therefore, the online forum text from Reddit are proved to be correlated with future stock price changes and might be used to make more profit than B&H strategy by incorporating their information in portfolio trading strategies
Financial sentiment analysis of quarterly reports and stock performance
This thesis aims to examine the use of financial sentiment analysis for quarterly reports published by companies listed on the Oslo Stock Exchange (OSE). Additionally, the intention of the study is to use methods from computer science to enable the transformation of financial reports, from the raw PDF format to the financial sentiment scores. Furthermore, this thesis aims to discuss the relationship between predicted financial sentiment and stock performance for chosen companies and industries. This thesis applies the famous and recently developed language model for financial sentiment analysis, FinBERT. The model is built upon a more general language model, BERT.
The motivation for the study is the increasing interest in machine learning and Natural Language Processing (NLP) for financial applications. Modern modeling techniques are allowing investors to make more informed decisions, and the rise of language modeling has made it possible to derive insight into the opinions of people through news and social networks. However, there are only a minority of studies investigating the language of quarterly reports.
Methodologically, quarterly reports from the first quarter of 2019 to the fourth quarter of 2021 are downloaded from the investor relations pages of the selected companies. The downloaded reports are the input of a data pipeline that extracts the text and predicts the financial sentiment using Python tools such as PDFMiner and the Transformers library. The predicted sentiment is then loaded into a pipeline for visualization and stock performance comparisons based on stock data downloaded with the yfinance open source tool.
The thesis concludes that extracting text from financial PDF files is feasible. Furthermore, the FinBERT model predicts the financial sentiment with a higher accuracy than the more general BERT model. However, the relationship between stock performance and predicted sentiment is not strong, despite individual differences. Additionally, the relationship is stronger for stock performance in the past. However, this thesis demonstrates the value of domain-specific NLP for applications in the financial industry.M-I
Analysis of S&P500 using News Headlines Applying Machine Learning Algorithms
Dissertation presented as the partial requirement for obtaining a Master's degree in Information Management, specialization in Knowledge Management and Business IntelligenceFinancial risk is in everyoneâs life now, directly or indirectly impacting peopleÂŽs daily life, empowering people on their decisions and the consequences of the same. This financial system comprises all the companies that produce and sell, making them an essential factor. This study addresses the impact people can have, by the news headlines written, on companiesâ stock prices.
S&P 500 is the index that will be studied in this research, compiling the biggest 500 companies in the USA and how the index can be affected by the News Articles written by humans from distinct and powerful Newspapers. Many people worldwide âplay the gameâ of investing in stock prices, winning or losing much money. This study also tries to understand how strongly this news and the Index, previously mentioned, can be correlated. With the increased data available, it is necessary to have some computational power to help process all of this data. There it is when the machine learning methods can have a crucial involvement. For this is necessary to understand how these methods can be applied and influence the final decision of the human that always has the same question:
Can stock prices be predicted? For that is necessary to understand first the correlation between news articles, one of the elements able to impact the stock prices, and the stock prices themselves. This study will focus on the correlation between News and S&P 500
Predictive Analytics on Emotional Data Mined from Digital Social Networks with a Focus on Financial Markets
This dissertation is a cumulative dissertation and is comprised of five articles. User-Generated Content (UGC) comprises a substantial part of communication via social media. In this dissertation, UGC that carries and facilitates the exchange of emotions is referred to as âemotional data.â People âproduceâ emotional data, that is, they express their emotions via tweets, forum posts, blogs, and so on, or they âconsumeâ it by being influenced by expressed sentiments, feelings, opinions, and the like. Decisions often depend on shared emotions and data â which again lead to new data because decisions may change behaviors or results. âEmotional Data Intelligenceâ ultimately seeks an answer to the question of how all the different emotions expressed in public online sources influence decision-making processes.
The overarching research topic of this dissertation follows the question whether network structures and emotional sentiment data extracted from digital social networks contain predictive information or they are just noise. Underlying data was collected from different social media sources, such as Twitter, blogs, message boards, or online news and social networking sites, such as Xing. By means of methodologies of social network analysis (SNA), sentiment analysis, and predictive analysis the individual contributions of this dissertation study whether sentiment data from social media or online social networking structures can predict real-world behaviors. The focus lies on the analysis of emotional data and network structures and its predictive power for financial markets. With the formal construction of the data analyses methodologies introduced in the individual contributions this dissertation contributes to the theories of social network analysis, sentiment analysis, and predictive analytics
Evidence on Competitive Advantage and Superior Stock Market Performance
This article analyzes the value-relevance of industry-based and resource-based competitive advantage in a large sample of firms listed on the Oslo Stock Exchange. We measure competitive advantage by a single variable and perform a new decomposition into its underlying sources. In 1986-2005, the industry-based and the resource-based competitive advantage explain more than 20% of abnormal stock market returns, accumulated over five years. The resource-based advantage is almost four times more important than the industry-based advantage. Differences in both the return and the risk capability of firmsâ net assets relative to their industry peers are significant parts of the resource-based advantage, estimated at 60% and 40%, respectively.Competitive advantage; superior performance; value-relevance of performance metrics
Recommended from our members
Analysis of new sentiment and its application to finance
This thesis was submitted for the degree of Doctor of philosophy and awarded by Brunel UniversityWe report our investigation of how news stories influence the behaviour of tradable financial assets, in particular, equities. We consider the established methods of turning news events into a quantifiable measure and explore the models which connect these measures to financial decision making and risk control. The study of our thesis is built around two practical, as well as, research problems which are determining trading strategies and quantifying trading risk. We have constructed a new measure which takes into consideration (i) the volume of news and (ii) the decaying effect of news sentiment. In this way we derive the impact of aggregated news events for a given asset; we have defined this as the impact score. We also characterise the behaviour of assets using three parameters, which are return, volatility and liquidity, and construct predictive models which incorporate impact scores. The derivation of the impact measure and the characterisation of asset behaviour by introducing liquidity are two innovations reported in this thesis and are claimed to be contributions to knowledge. The impact of news on asset behaviour is explored using two sets of predictive models: the univariate models and the multivariate models. In our univariate predictive models, a universe of 53 assets were considered in order to justify the relationship of news and assets across 9 different sectors. For the multivariate case, we have selected 5 stocks from the financial sector only as this is relevant for the purpose of constructing trading strategies. We have analysed the celebrated Black-Litterman model (1991) and constructed our Bayesian multivariate predictive models such that we can incorporate domain expertise to improve the predictions. Not only does this suggest one of the best ways to choose priors in Bayesian inference for financial models using news sentiment, but it also allows the use of current and synchronised data with market information. This is also a novel aspect of our work and a further contribution to knowledge.Engineering and Physical Sciences Research Council (EPSRC) and OptiRisk Systems
- âŠ