Correlating Sentiment in Reddit’s Wallstreetbets with the Stock Market Using Machine Learning Techniques

Abstract

The issue that this study addresses is to observe whether there exists a statistical relation between the stock market and Reddit’s wallstreetbets. Previous research mainly focused on the relation between the stock market and Twitter. To gather data for the study, comments were scrapped from the subreddit wallstreetbets for a period of four months, Jan 1, 2021, till April 30, 2021. Different sentiment classifiers were, then, applied on a sample of the data to observe the most accurate classifier for the study. The study concluded that the most accurate sentiment classifier was an SVM classifier trained on 80% of Reddit comments. However, when 10-k fold cross validation was preformed, we saw a drop in the accuracy of the SVM classifier, and a rise in the accuracy of the logistic regression classifier. Therefore, the logistic regression classifier with an accuracy of 72.82% was selected for the project. The logistic regression classifier was used to obtain a polarity time series for all the comments present in Reddit. Then, a Granger causality test was applied to the polarity time series and the AMC, GMC rate of return time series. The Granger test shows that sentiment in Reddit’s wallstreetbets appears to “Granger cause” GME and AMC rate of return at different time lags. Moreover, it appears that sentiment on Reddit had more of an impact on GME rate of return than it did with AMC

    Similar works