unknown

Semantic smoothing for Twitter sentiment analysis

Abstract

Twitter has brought much attention recently as a hot research topic in the domain of sentiment analysis. Training sentiment classifier from tweets data often faces the data sparsity problem partly due to the large variety of short forms introduced to tweets because of the 140-character limit. In this work we propose using semantic smoothing to alleviate the data sparseness problem. Our approach extracts semantically hidden concepts from the training documents and then incorporates these concepts as additional features for classifier training. We tested our approach using two different methods. One is shallow semantic smoothing where words are replaced with their corresponding semantic concepts; another is to interpolate the original unigram language model in the Naive Bayes NB classifier with the generative model of words given semantic concepts. Preliminary results show that with shallow semantic smoothing the vocabulary size has been reduced by 20%. Moreover, the interpolation method improves upon shallow semantic smoothing by over 5% in sentiment classification and slightly outperforms NB trained on unigrams only without semantic smoothing

    Similar works