Skip to main content
Article thumbnail
Location of Repository

Thumbs up or thumbs down? Semantic orientation applied to unsupervised classification of reviews

By Peter D. Turney

Abstract

This paper presents a simple unsupervised learning algorithm for classifying reviews as recommended (thumbs up) or not recommended (thumbs down). The classification of a review is predicted by the average semantic orientation of the phrases in the review that contain adjectives or adverbs. A phrase has a positive semantic orientation when it has good associations (e.g., "subtle nuances") and a negative semantic orientation when it has bad associations (e.g., "very cavalier"). In this paper, the semantic orientation of a phrase is calculated as the mutual information between the given phrase and the word "excellent" minus the mutual information between the given phrase and the word "poor". A review is classified as recommended if the average semantic orientation of its phrases is positive. The algorithm achieves an average accuracy of 74% when evaluated on 410 reviews from Epinions, sampled from four different domains (reviews of automobiles, banks, movies, and travel destinations). The accuracy ranges from 84% for automobile reviews to 66% for movie reviews

Topics: Artificial Intelligence, Language, Machine Learning, Statistical Models
Year: 2002
OAI identifier: oai:cogprints.org:2321

Suggested articles

Citations

  1. (2001). A simple approach to ordinal classification.
  2. (1997). A solution to Plato’s problem: The latent semantic analysis theory of the acquisition, induction, and representation of knowledge.
  3. (1996). An introduction to categorical data analysis.
  4. (1992). Direction-based text interpretation as an information access refinement. In
  5. (2000). Effects of adjective orientation and gradability on sentence subjectivity.
  6. (1995). Part-of-Speech Tagging Guidelines for the Penn Treebank Project (3rd revision, 2nd printing).
  7. (1997). Predicting the semantic orientation of adjectives.
  8. (1997). Smokey: Automatic recognition of hostile messages.
  9. (1994). Some advances in transformation-based part of speech tagging.
  10. (1989). Word association norms, mutual information and lexicography.

To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.