35,866 research outputs found
Mutual-Excitation of Cryptocurrency Market Returns and Social Media Topics
Cryptocurrencies have recently experienced a new wave of price volatility and
interest; activity within social media communities relating to cryptocurrencies
has increased significantly. There is currently limited documented knowledge of
factors which could indicate future price movements. This paper aims to
decipher relationships between cryptocurrency price changes and topic
discussion on social media to provide, among other things, an understanding of
which topics are indicative of future price movements. To achieve this a
well-known dynamic topic modelling approach is applied to social media
communication to retrieve information about the temporal occurrence of various
topics. A Hawkes model is then applied to find interactions between topics and
cryptocurrency prices. The results show particular topics tend to precede
certain types of price movements, for example the discussion of 'risk and
investment vs trading' being indicative of price falls, the discussion of
'substantial price movements' being indicative of volatility, and the
discussion of 'fundamental cryptocurrency value' by technical communities being
indicative of price rises. The knowledge of topic relationships gained here
could be built into a real-time system, providing trading or alerting signals.Comment: 3rd International Conference on Knowledge Engineering and
Applications (ICKEA 2018) - Moscow, Russia (June 25-27 2018
Stock market random forest-text mining system mining critical indicators of stock market movements
Stock Market (SM) is believed to be a significant sector of a free market economy as it plays a crucial role in the growth of commerce and industry of a country. The increasing importance of SMs and their direct influence on economy were the main reasons for analysing SM movements. The need to determine early warning indicators for SM crisis has been the focus of study by many economists and politicians. Whilst most research into the identification of these critical indicators applied data mining to uncover hidden knowledge, very few attempted to adopt a text mining approach. This paper demonstrates how text mining combined with Random Forest algorithm can offer a novel approach to the extraction of critical indicators, and classification of related news articles. The findings of this study extend the current classification of critical indicators from three to eight classes; it also show that Random Forest can outperform other classifiers and produce high accuracy
Forecasting movements of health-care stock prices based on different categories of news articles using multiple kernel learning
—The market state changes when a new piece of information arrives. It affects decisions made by investors and is considered to be an important data source that can be used for financial forecasting. Recently information derived from news articles has become a part of financial predictive systems. The usage of news articles and their forecasting potential have been extensively researched.
However, so far no attempts have been made to utilise different categories of news articles simultaneously. This paper studies how the concurrent, and appropriately weighted, usage of news articles, having different degrees of relevance to the target stock, can improve the performance of financial forecasting and support the decision-making process of investors and traders. Stock price movements are predicted using the multiple kernel learning technique which integrates information extracted from multiple news categories while separate kernels are utilised to analyse each category. News articles are partitioned according to their relevance to the target stock, its sub industry, industry, group industry and sector. The experiments are run on stocks from the Health Care sector and show that increasing the number of relevant news categories used as data sources for financial forecasting improves the performance of the predictive system in comparison with approaches based on a lower number of categories
Liquidity commonality does not imply liquidity resilience commonality: A functional characterisation for ultra-high frequency cross-sectional LOB data
We present a large-scale study of commonality in liquidity and resilience
across assets in an ultra high-frequency (millisecond-timestamped) Limit Order
Book (LOB) dataset from a pan-European electronic equity trading facility. We
first show that extant work in quantifying liquidity commonality through the
degree of explanatory power of the dominant modes of variation of liquidity
(extracted through Principal Component Analysis) fails to account for heavy
tailed features in the data, thus producing potentially misleading results. We
employ Independent Component Analysis, which both decorrelates the liquidity
measures in the asset cross-section, but also reduces higher-order statistical
dependencies.
To measure commonality in liquidity resilience, we utilise a novel
characterisation as the time required for return to a threshold liquidity
level. This reflects a dimension of liquidity that is not captured by the
majority of liquidity measures and has important ramifications for
understanding supply and demand pressures for market makers in electronic
exchanges, as well as regulators and HFTs. When the metric is mapped out across
a range of thresholds, it produces the daily Liquidity Resilience Profile (LRP)
for a given asset. This daily summary of liquidity resilience behaviour from
the vast LOB dataset is then amenable to a functional data representation. This
enables the comparison of liquidity resilience in the asset cross-section via
functional linear sub-space decompositions and functional regression. The
functional regression results presented here suggest that market factors for
liquidity resilience (as extracted through functional principal components
analysis) can explain between 10 and 40% of the variation in liquidity
resilience at low liquidity thresholds, but are less explanatory at more
extreme levels, where individual asset factors take effect
Using Text Mining to Analyze Quality Aspects of Unstructured Data: A Case Study for “stock-touting” Spam Emails
The growth in the utilization of text mining tools and techniques in the last decade has been primarily driven by the increase in the sheer volume of unstructured texts and the need to extract useful and more importantly, quality information from them. The impetus to analyse unstructured data efficiently and effectively as part of the decision making processes within an organization has further motivated the need to better understand how to use text mining tools and techniques. This paper describes a case study of a stock spam e-mail architecture that demonstrates the process of refining linguistic resources to extract relevant, high quality information including stock profile, financial key words, stock and company news (positive/negative), and compound phrases from stock spam e-mails. The context of such a study is to identify high quality information patterns that can be used to support relevant authorities in detecting and analyzing fraudulent activities
Analysis of S&P500 using News Headlines Applying Machine Learning Algorithms
Dissertation presented as the partial requirement for obtaining a Master's degree in Information Management, specialization in Knowledge Management and Business IntelligenceFinancial risk is in everyone’s life now, directly or indirectly impacting people´s daily life, empowering people on their decisions and the consequences of the same. This financial system comprises all the companies that produce and sell, making them an essential factor. This study addresses the impact people can have, by the news headlines written, on companies’ stock prices.
S&P 500 is the index that will be studied in this research, compiling the biggest 500 companies in the USA and how the index can be affected by the News Articles written by humans from distinct and powerful Newspapers. Many people worldwide “play the game” of investing in stock prices, winning or losing much money. This study also tries to understand how strongly this news and the Index, previously mentioned, can be correlated. With the increased data available, it is necessary to have some computational power to help process all of this data. There it is when the machine learning methods can have a crucial involvement. For this is necessary to understand how these methods can be applied and influence the final decision of the human that always has the same question:
Can stock prices be predicted? For that is necessary to understand first the correlation between news articles, one of the elements able to impact the stock prices, and the stock prices themselves. This study will focus on the correlation between News and S&P 500
ALGA: Automatic Logic Gate Annotator for Building Financial News Events Detectors
We present a new automatic data labelling framework called ALGA - Automatic Logic Gate Annotator. The framework helps to create large amounts of annotated data for training domain-specific financial news events detection classifiers quicker. ALGA framework implements a rules-based approach to annotate a training dataset. This method has following advantages: 1) unlike traditional data labelling methods, it helps to filter relevant news articles from noise; 2) allows easier transferability to other domains and better interpretability of models trained on automatically labelled data. To create this framework, we focus on the U.S.-based companies that operate in the Apparel and Footwear industry. We show that event detection classifiers trained on the data generated by our framework can achieve state-of-the-art performance in the domain-specific financial events detection task. Besides, we create a domain-specific events synonyms dictionary
Text-Mining in Streams of Textual Data Using Time Series Applied to Stock Market
Each day, a lot of text data is generated. This data comes from various sources and may contain valuable information. In this article, we use text mining methods to discover if there is a connection between news articles and changes of the S&P 500 stock index. The index values and documents were divided into time windows according to the direction of the index value changes. We achieved a classification accuracy of 65-74 %.O
- …