Search CORE

3,942 research outputs found

Finetuning BERT and XLNet for Sentiment Analysis of Stock Market Tweets using Mixout and Dropout Regularization

Author: Jangir Shubham
Publication venue: Technological University Dublin
Publication date: 01/01/2021
Field of study

Sentiment analysis is also known as Opinion mining or emotional mining which aims to identify the way in which sentiments are expressed in text and written data. Sentiment analysis combines different study areas such as Natural Language Processing (NLP), Data Mining, and Text Mining, and is quickly becoming a key concern for businesses and organizations, especially as online commerce data is being used for analysis. Twitter is also becoming a popular microblogging and social networking platform today for information among people as they contribute their opinions, thoughts, and attitudes on social media platforms over the years. Because of the large database created by twitter stock market sentiment analysis has always been the subject of interest for various researchers, investors, and scientists due to its highly unpredictable nature. Sentiment analysis can be performed in different ways, but the focus of this study is to perform sentiment analysis using the transformer-based pre-trained models such as BERT(bi-directional Encoder Representations from Transformers) and XLNet which is a Generalised autoregressive model with fewer training instances using Mixout regularization as the traditional machine and deep learning models such as Random Forest, Naïve Bayes, Recurrent Neural Network (RNN), Long short-term memory (LSTM) because fails when given fewer training instances and it required intense feature engineering and processing of textual data. The objective of this research is to study and understand the performance of BERT and XLNet with fewer training instances using the Mixout regularization for stock market sentiment analysis. The proposed model resulted in improved performance in terms of accuracy, precision, recall and f1-score for both the BERT and XLNet models using mixout regularization when given adequate and under-sampled data

Arrow@TUDublin

An Ensemble Classifier for Stock Trend Prediction Using Sentence-Level Chinese News Sentiment and Technical Indicators

Author: Chen Chun-Hao
Chen Po-Yeh
Chun-Wei Lin Jerry
Publication venue: 'Universidad Internacional de La Rioja'
Publication date: 20/05/2022
Field of study

In the financial market, predicting stock trends based on stock market news is a challenging task, and researchers are devoted to developing forecasting models. From the existing literature, the performance of the forecasting model is better when news sentiment and technical analysis are considered than when only one of them is used. However, analyzing news sentiment for trend forecasting is a difficult task, especially for Chinese news, because it is unstructured data and extracting the most important features is difficult. Moreover, positive or negative news does not always affect stock prices in a certain way. Therefore, in this paper, we propose an approach to build an ensemble classifier using sentiment in Chinese news at sentence level and technical indicators to predict stock trends. In the training stages, we first divide each news item into a set of sentences. TextRank and word2vec are then used to generate a predefined number of key sentences. The sentiment scores of these key sentences are computed using the given financial lexicon. The sentiment values of the key phrases, the three values of the technical indicators and the stock trend label are merged as a training instance. Based on the sentiment values of the key sets, the corpora are divided into positive and negative news datasets. The two datasets formed are then used to build positive and negative stock trend prediction models using the support vector machine. To increase the reliability of the prediction model, a third classifier is created using the Bollinger Bands. These three classifiers are combined to form an ensemble classifier. In the testing phase, a voting mechanism is used with the trained ensemble classifier to make the final decision based on the trading signals generated by the three classifiers. Finally, experiments were conducted on five years of news and stock prices of one company to show the effectiveness of the proposed approach, and results show that the accuracy and P / L ratio of the proposed approach are 61% and 4.0821 are better than the existing approach

Re-UNIR

Recommended from our members

Sentiment analysis: text, pre-processing, reader views and cross domains

Author: Haddi Emma
Publication venue: Brunel University London
Publication date: 01/01/2015
Field of study

This thesis was submitted for the award of Doctor of Philosophy and was awarded by Brunel University LondonSentiment analysis has emerged as a field that has attracted a significant amount of attention since it has a wide variety of applications that could benefit from its results, such as news analytics, marketing, question answering, knowledge management and so on. This area, however, is still early in its development where urgent improvements are required on many issues, particularly on the performance of sentiment classification. In this thesis, three key challenging issues affecting sentiment classification are outlined and innovative ways of addressing these issues are presented. First, text pre-processing has been found crucial on the sentiment classification performance. Consequently, a combination of several existing preprocessing methods is proposed for the sentiment classification process. Second, text properties of financial news are utilised to build models to predict sentiment. Two different models are proposed, one that uses financial events to predict financial news sentiment, and the other uses a new interesting perspective that considers the opinion reader view, as opposed to the classic approach that examines the opinion holder view. A new method to capture the reader sentiment is suggested. Third, one characteristic of financial news is that it stretches over a number of domains, and it is very challenging to infer sentiment between different domains. Various approaches for cross-domain sentiment analysis have been proposed and critically evaluated

Brunel University Research Archive

Finetuning Pre-Trained Language Models for Sentiment Classification of COVID19 Tweets

Author: Dussa Arjun
Publication venue: Dublin Institute of Technology
Publication date: 01/01/2020
Field of study

It is a common practice in today’s world for the public to use different micro-blogging and social networking platforms, predominantly Twitter, to share opinions, ideas, news, and information about many things in life. Twitter is also becoming a popular channel for information sharing during pandemic outbreaks and disaster events. The world has been suffering from economic crises ever since COVID-19 cases started to increase rapidly since January 2020. The virus has killed more than 800 thousand people ever since the discovery as per the statistics from Worldometer [1] which is the authorized tracking website. So many researchers around the globe are researching into this new virus from different perspectives. One such area is analysing micro-blogging sites like twitter to understand public sentiments. Traditional sentiment analysis methods require complex feature engineering. Many embedding representations have come these days but, their context-independent nature limits their representative power in rich context, due to which performance gets degraded in NLP tasks. Transfer learning has gained the popularity and pretrained language models like BERT(bi-directional Encoder Representations from Transformers) and XLNet which is a Generalised autoregressive model have started overtaking traditional machine learning and deep learning models like Random Forests, Naïve Bayes, Convolutional Neural Networks etc. Despite the great performance results by pretrained language models, it has been observed that finetuning a large pretrained model on downstream task with less training instances is prone to degrade the performance of the model. This research is based on a regularization technique called Mixout proposed by Lee (Lee, 2020). Mixout stochastically mixes the parameters of vanilla network and dropout network. This work is to understand the performance variations of finetuning BERT and XLNet base models on COVID-19 tweets by using Mixout regularization for sentiment classification

Arrow@TUDublin

Real-Time Stock Market Recommendation & Prediction using Multi Source Data

Author: Konety Kalpana
Publication venue: Technological University Dublin
Publication date: 01/01/2022
Field of study

Stock investors must be cognizant of both the current price of their stock and the price at which they want to sell it in the future. This does not stop investors to monitor past price patterns and apply their knowledge to the present. ’Past performance is not an indicator of future success’, as the saying goes. To put it another way, historical stock data alone isn’t enough to forecast future stock prices. Another key factor to consider in a trading strategy is the impact of market psychology. Financial data, which is a type of multimedia data, provides a wealth of information that has been widely used for data analysis tasks. However, predicting stock prices remains a popular study topic for investors and financial scholars. Forecasting stock prices has become an extremely difficult undertaking because of the significant noise, nonlinearity, and volatility of stock price statistic data

Arrow@TUDublin