Search CORE

2 research outputs found

Leveraging Sentiment to Compute Word Similarity

Author
Publication venue
Publication date
Field of study

In this paper, we introduce a new WordNet based similarity metric, SenSim, which incorporates sentiment content (i.e., degree of positive or negative sentiment) of the words being compared to measure the similarity. The proposed metric is based on the hypothesis that knowing the sentiment is beneficial in measuring the similarity. To verify this hypothesis, we measure and compare the annotator agreement for 2 annotation strategies: 1) sentiment information of a pair of words is considered while annotating and 2) sentiment information of a pair of words is not considered while annotating. Inter-annotator correlation scores show that the agreement is better when the two annotators consider sentiment information while assigning a similarity score to a pair of words. We use this hypothesis to measure the similarity between a pair of words. Specifically, we represent each word as a vector containing the sentiment scores of all the content words in the WordNet gloss of the words. These sentiment scores are derived from a sentiment lexicon. We then measure the cosine similarity between the two vectors. We perform both intrinsic and extrinsic evaluation of SenSim. As a part of intrinsic evaluation, we calculate the correlation score with gold standard data and compare it with other popular WordNet based metrics. We find that SenSim has better correlation than other similarity metrics. Further, as a part of extrinsic evaluation, we use Sen-Sim in an application. We evaluate SenSim for mitigating unknown feature problem in supervised sentiment classification using replacement strategy based on similarity metrics as proposed by Balamurali et al. (2011). Our results show that new metric performs better than all the existing metrics used for comparison.

CiteSeerX

Leveraging Sentiment to Compute Word Similarity

Author: Balamurali A. R.
Bhattacharyya Pushpak
Malu Akshat
Mukherjee Subhabrata
Publication venue
Publication date: 18/09/2012
Field of study

In this paper, we introduce a new WordNet based similarity metric, SenSim, which incorporates sentiment content (i.e., degree of positive or negative sentiment) of the words being compared to measure the similarity between them. The proposed metric is based on the hypothesis that knowing the sentiment is beneficial in measuring the similarity. To verify this hypothesis, we measure and compare the annotator agreement for 2 annotation strategies: 1) sentiment information of a pair of words is considered while annotating and 2) sentiment information of a pair of words is not considered while annotating. Inter-annotator correlation scores show that the agreement is better when the two annotators consider sentiment information while assigning a similarity score to a pair of words. We use this hypothesis to measure the similarity between a pair of words. Specifically, we represent each word as a vector containing sentiment scores of all the content words in the WordNet gloss of the sense of that word. These sentiment scores are derived from a sentiment lexicon. We then measure the cosine similarity between the two vectors. We perform both intrinsic and extrinsic evaluation of SenSim and compare the performance with other widely usedWordNet similarity metrics.Comment: The paper is available at http://subhabrata-mukherjee.webs.com/publications.ht

arXiv.org e-Print Archive