301 research outputs found
Tweet sentiment: From classification to quantification
Abstract—Sentiment classification has become a ubiq-uitous enabling technology in the Twittersphere, since classifying tweets according to the sentiment they convey towards a given entity (be it a product, a person, a political party, or a policy) has many applications in political science, social science, market research, and many others. In this paper we contend that most previous studies dealing with tweet senti-ment classification (TSC) use a suboptimal approach. The reason is that the final goal of most such studies is not estimating the class label (e.g., Positive, Negative, or Neutral) of individual tweets, but estimating the rel-ative frequency (a.k.a. “prevalence”) of the different classes in the dataset. The latter task is called quan-tification, and recent research has convincingly shown that it should be tackled as a task of its own, using learning algorithms and evaluationmeasures different from those used for classification. In this paper we show, on a multiplicity of TSC datasets, that using a quantification-specific algorithm produces substan-tially better class frequency estimates than a state-of-the-art classification-oriented algorithm routinely used in TSC. We thus argue that researchers inter-ested in tweet sentiment prevalence should switch to quantification-specific (instead of classification-specific) learning algorithms and evaluationmeasures. 1
- …