2,570 research outputs found
Twitter alloy steel disambiguation and user relevance via one-class and two-class news titles classifiers
This paper addresses the nontrivial task of Twitter financial disam- biguation (TFD), which is relevant to filter financial domain tweets (e.g., alloy steel or coffee prices) when no unique identifiers (e.g., cashtags) are adopted. To automate TFD, we propose a transfer learning approach that uses freely labeled news titles to train diverse one-class and two-class classification methods. These include different text handling transforms, adaptations of statistical measures and modern machine learning methods, including support vector machines (SVM), deep autoencoders and multilayer perceptrons. As a case study, we analyzed the domain of alloy steel prices, collecting a recent Twitter dataset. Overall, the best results were achieved by a two-class SVM fed with TFD statistical measures and topic model features, obtaining an 80% and 71% discrimination level when tested with 11,081 and 3,000 manually labeled tweets. The best one-class performance (78% and 69% for the same test tweets) was obtained by a term frequency-inverse document frequency classifier (TF-IDFC). These models were further used to gen- erate a Financial User Relevance rank (FUR) score, aiming to filter relevant users. The SVM and TF-IDFC FUR models obtained a predictive user discrimination level of 80% and 75% when tested with a manually labeled test sample of 418 users. These results confirm the proposed joint TFD-FUR approach as a valuable tool for the selection of Twitter texts and users for financial social media analytics (e.g., sentiment analysis, detection of influential users).Research carried out with the support of resources of Big and Open Data Innovation Laboratory (BODaI-Lab), University of Brescia, granted by Fondazione Cariplo and Regione Lombardia
Market volatility : can machine learning methods enhance volatility forecasting?
This dissertation aims to test whether the use of machine learning (ML) techniques can improve
volatility forecasting accuracy. More specifically, if it can beat the best econometric model, the
Heterogeneous Autoregressive model of Realized Volatility (HAR-RV). Using S&P 500 Index
data from May-2007 to August-2022, the superiority of the HAR-RV was tested and attested
against competing econometric models EWMA and GARCH(1,1). Next, the performance of
the ML Artificial Neural Network algorithms Long Short-Term Memory (LSTM) and Gated
Recurrent Unit (GRU) are compared to the performance of the econometric models. Five
different variable sets are tested for the ML models. It is found that while both ML models are
able to beat the EWMA and GARCH(1,1) models by a significant margin, the HAR-RV model
still outperforms LSTM and GRU.
Moreover, an analysis is conduced on the models’ predictions on the period corresponding to
the Covid-19 crisis. The results did not show any evidence suggesting that ML methods have
a particular advantage at predicting during high volatility events.
Finally, a plausible cause that could undermine the remarkable qualities of the ML methods in
the aim of volatility forecasting is discussed. It is found that the rigorous set of conditions
needed to be met for the proper setup of ML models are very difficult to be met using financial
data, which hinders the aptitude of ML for this purpose.Esta tese visa testar se o uso de técnicas de Machine Learning (ML) pode melhorar a precisão
da previsão da volatilidade. Mais especificamente, se estes algoritmos conseguem superar o
melhor modelo econométrico, o Heterogeneous Autoregressive model of Realized Volatility
(HAR-RV). Usando dados do Índice S&P 500 de Maio-2007 a Agosto-2022, a superioridade
do HAR-RV perante os modelos econométricos concorrentes EWMA e GARCH(1,1), foi
testada e confirmada. Em seguida, o desempenho dos algoritmos ML de redes neurais artificiais
de Long Short-Term Memory (LSTM) e Gated Recurrent Unit (GRU) são comparados com o
desempenho dos modelos econométricos tradicionais. Cinco conjuntos diferentes de variáveis
são testados para os modelos ML. Verifica-se que enquanto ambos os modelos ML são capazes
de superar os modelos EWMA e GARCH(1,1) por uma margem significante, o modelo HARRV ainda tem um desempenho superior ao LSTM e ao GRU.
É ainda feita uma análise das previsões dos modelos durante o período correspondente à crise
do Covid-19. Os resultados não mostram qualquer evidência que sugira que os métodos ML
têm uma particular vantagem durante eventos de alta volatilidade.
Finalmente, é discutida uma possível causa que poderá debilitar as sofisticadas qualidades dos
métodos ML para a finalidade de previsão de volatilidade. Verifica-se que o conjunto rigoroso
de condições necessárias para a correcta configuração dos modelos ML é muito difícil de se
cumprir utilizando series temporais de volatilidade de mercado, o que prejudica a aptidão dos
modelos ML para esta finalidade
Stock Market Prediction via Deep Learning Techniques: A Survey
The stock market prediction has been a traditional yet complex problem
researched within diverse research areas and application domains due to its
non-linear, highly volatile and complex nature. Existing surveys on stock
market prediction often focus on traditional machine learning methods instead
of deep learning methods. Deep learning has dominated many domains, gained much
success and popularity in recent years in stock market prediction. This
motivates us to provide a structured and comprehensive overview of the research
on stock market prediction focusing on deep learning techniques. We present
four elaborated subtasks of stock market prediction and propose a novel
taxonomy to summarize the state-of-the-art models based on deep neural networks
from 2011 to 2022. In addition, we also provide detailed statistics on the
datasets and evaluation metrics commonly used in the stock market. Finally, we
highlight some open issues and point out several future directions by sharing
some new perspectives on stock market prediction
Reinforcement Learning Applied to Trading Systems: A Survey
Financial domain tasks, such as trading in market exchanges, are challenging
and have long attracted researchers. The recent achievements and the consequent
notoriety of Reinforcement Learning (RL) have also increased its adoption in
trading tasks. RL uses a framework with well-established formal concepts, which
raises its attractiveness in learning profitable trading strategies. However,
RL use without due attention in the financial area can prevent new researchers
from following standards or failing to adopt relevant conceptual guidelines. In
this work, we embrace the seminal RL technical fundamentals, concepts, and
recommendations to perform a unified, theoretically-grounded examination and
comparison of previous research that could serve as a structuring guide for the
field of study. A selection of twenty-nine articles was reviewed under our
classification that considers RL's most common formulations and design patterns
from a large volume of available studies. This classification allowed for
precise inspection of the most relevant aspects regarding data input,
preprocessing, state and action composition, adopted RL techniques, evaluation
setups, and overall results. Our analysis approach organized around fundamental
RL concepts allowed for a clear identification of current system design best
practices, gaps that require further investigation, and promising research
opportunities. Finally, this review attempts to promote the development of this
field of study by facilitating researchers' commitment to standards adherence
and helping them to avoid straying away from the RL constructs' firm ground.Comment: 38 page
Efficient Integration of Multi-Order Dynamics and Internal Dynamics in Stock Movement Prediction
Advances in deep neural network (DNN) architectures have enabled new
prediction techniques for stock market data. Unlike other multivariate
time-series data, stock markets show two unique characteristics: (i)
\emph{multi-order dynamics}, as stock prices are affected by strong
non-pairwise correlations (e.g., within the same industry); and (ii)
\emph{internal dynamics}, as each individual stock shows some particular
behaviour. Recent DNN-based methods capture multi-order dynamics using
hypergraphs, but rely on the Fourier basis in the convolution, which is both
inefficient and ineffective. In addition, they largely ignore internal dynamics
by adopting the same model for each stock, which implies a severe information
loss.
In this paper, we propose a framework for stock movement prediction to
overcome the above issues. Specifically, the framework includes temporal
generative filters that implement a memory-based mechanism onto an LSTM network
in an attempt to learn individual patterns per stock. Moreover, we employ
hypergraph attentions to capture the non-pairwise correlations. Here, using the
wavelet basis instead of the Fourier basis, enables us to simplify the message
passing and focus on the localized convolution. Experiments with US market data
over six years show that our framework outperforms state-of-the-art methods in
terms of profit and stability. Our source code and data are available at
\url{https://github.com/thanhtrunghuynh93/estimate}.Comment: Technical report for accepted paper at WSDM 202
- …