Visualizing interactions between company fundamentals, traditional news and social media through features engineering

Abstract

We combined sentiment signals coming from traditional news media, Twits and corporates fundamentals in order to predict monthly stock returns for S&P500 companies. We implemented features engineering layers at the top of these signals, with the intent of modeling the decision process of investors that operate within the stock market. We approached the problem as a classification task based on a Random Forest classifier. Through our research we managed to obtain a Receiver Operating CharacteristicArea Under the Curve (Roc Auc) of 0.7949 and an Accuracy Score of 0.8126 over the monthly returns in the next five days after the last training date, results in opposition with the underlying assumptions of Efficient Market Hypothesis (EMH). In addition to classification metrics, we proposed a consistent methodology for evaluating features importance in time series forecasting tasks based on Shap values, highlighting how the distribution of sentiment signals among companies with similar fundamentals accounts for a remarkable impact on monthly returns

    Similar works