Three Essays on Opinion Mining of Social Media Texts

Abstract

This dissertation research is a collection of three essays on opinion mining of social media texts. I explore different theoretical and methodological perspectives in this inquiry. The first essay focuses on improving lexicon-based sentiment classification. I propose a method to automatically generate a sentiment lexicon that incorporates knowledge from both the language domain and the content domain. This method learns word associations from a large unannotated corpus. These associations are used to identify new sentiment words. Using a Twitter data set containing 743,069 tweets related to the stock market, I show that the sentiment lexicons generated using the proposed method significantly outperforms existing sentiment lexicons in sentiment classification. As sentiment analysis is being applied to different types of documents to solve different problems, the proposed method provides a useful tool to improve sentiment classification. The second essay focuses on improving supervised sentiment classification. In previous work on sentiment classification, a document was typically represented as a collection of single words. This method of feature representation suffers from severe ambiguity, especially in classifying short texts, such as microblog messages. I propose the use of dependency features in sentiment classification. A dependency describes the relationship between a pair of words even when they are distant. I compare the sentiment classification performance of dependency features with a few commonly used features in different experiment settings. The results show that dependency features significantly outperform existing feature representations. In the third essay, I examine the relationship between social media sentiment and stock returns. This is the first study to test the bidirectional effects in this relationship. Based on theories in behavioral finance research, I speculate that social media sentiment does not predict stock return, but rather that stock return predicts social media sentiment. I empirically test a set of research hypotheses by applying the vector autoregression (VAR) model on a social media data set, which is much larger than those used in previous studies. The hypotheses are supported by the results. The findings have significant implications for both theory and practice

    Similar works