2,118 research outputs found
Non-distributional Word Vector Representations
Data-driven representation learning for words is a technique of central
importance in NLP. While indisputably useful as a source of features in
downstream tasks, such vectors tend to consist of uninterpretable components
whose relationship to the categories of traditional lexical semantic theories
is tenuous at best. We present a method for constructing interpretable word
vectors from hand-crafted linguistic resources like WordNet, FrameNet etc.
These vectors are binary (i.e, contain only 0 and 1) and are 99.9% sparse. We
analyze their performance on state-of-the-art evaluation methods for
distributional models of word vectors and find they are competitive to standard
distributional approaches.Comment: Proceedings of ACL 201
Sentiment Classification Using a Sense Enriched Lexicon-based Approach
The prominent approach in sentiment polarity classification is the Lexicon-based approach which relies on a dictionary to assign a score to subjective words. Most of the existing work use score of the most dominant sense in this process instead of using the contextually appropriate sense. The use of Word Sense Disambiguation (WSD) is less investigated in the sentiment classification tasks. This paper investigates the effect of integrating WSD into a Lexicon-based approach for Sentiment Polarity classification and compares it with the existing Lexicon-based approaches and the state-of-art supervised approaches. The lexicon used in this work is SentiWordNet v2.0. The proposed approach, called Sense Enriched Lexicon-based Approach (SELSA), uses a word sense disambiguation module to identify the correct sense of subjective words. Instead of using the score of the most frequent sense, it uses the score of the contextually appropriate sense only. For the purpose of comparison with the supervised approaches, the authors investigate Naïve Bayes (NB) and Support Vector Machines (SVM) classifiers which tend to perform better in earlier research. The performance of these classifiers is evaluated using Word2vec, Hashing Vectorizer, and bi-gram feature. The best-performing classifier-feature combination is used for comparison. All the evaluations are done on the Movie Review dataset. SELSA achieves an accuracy of 96.25% which is significantly better than the accuracy obtained by SentiWordNet-based approach without WSD on the same dataset. The performance of the proposed algorithm is also compared with the best-performing supervised classifier investigated in this work and earlier reported works on the same dataset. The results reveal that the SVM classifier performs better than SentiWordNet approach without WSD. However, after incorporating WSD the performance of the proposed Lexicon-based approach is significantly improved and it surpasses the best-performing supervised classifier (SVM with bi-gram features)
Recommended from our members
User sentiment detection: a YouTube use case
In this paper we propose an unsupervised lexicon-based approach to detect the sentiment polarity of user comments in YouTube. Polarity detection in social media content is challenging not only because of the existing limitations in current sentiment dictionaries but also due to the informal linguistic styles used by users. Present dictionaries fail to capture the sentiments of community-created terms. To address the challenge we adopted a data-driven approach and prepared a social media specific list of terms and phrases expressing user sentiments and opinions. Experimental evaluation shows the combinatorial approach has greater potential. Finally, we discuss many research challenges involving social media sentiment analysis
Measuring praise and criticism: Inference of semantic orientation from association
The evaluative character of a word is called its semantic orientation. Positive semantic orientation indicates praise (e.g., "honest", "intrepid") and negative semantic orientation indicates criticism (e.g., "disturbing", "superfluous"). Semantic orientation varies in both direction (positive or negative) and degree (mild to strong). An automated system for measuring semantic orientation would have application in text classification, text filtering, tracking opinions in online discussions, analysis of survey responses, and automated chat systems (chatbots). This paper introduces a method for inferring the semantic orientation of a word from its statistical association with a set of positive and negative paradigm words. Two instances of this approach are evaluated, based on two different statistical measures of word association: pointwise mutual information (PMI) and latent semantic analysis (LSA). The method is experimentally tested with 3,596 words (including adjectives, adverbs, nouns, and verbs) that have been manually labeled positive (1,614 words) and negative (1,982 words). The method attains an accuracy of 82.8% on the full test set, but the accuracy rises above 95% when the algorithm is allowed to abstain from classifying mild words
Crowdsourcing a Word-Emotion Association Lexicon
Even though considerable attention has been given to the polarity of words
(positive and negative) and the creation of large polarity lexicons, research
in emotion analysis has had to rely on limited and small emotion lexicons. In
this paper we show how the combined strength and wisdom of the crowds can be
used to generate a large, high-quality, word-emotion and word-polarity
association lexicon quickly and inexpensively. We enumerate the challenges in
emotion annotation in a crowdsourcing scenario and propose solutions to address
them. Most notably, in addition to questions about emotions associated with
terms, we show how the inclusion of a word choice question can discourage
malicious data entry, help identify instances where the annotator may not be
familiar with the target term (allowing us to reject such annotations), and
help obtain annotations at sense level (rather than at word level). We
conducted experiments on how to formulate the emotion-annotation questions, and
show that asking if a term is associated with an emotion leads to markedly
higher inter-annotator agreement than that obtained by asking if a term evokes
an emotion
- …