1,579 research outputs found
Crowdsourcing a Word-Emotion Association Lexicon
Even though considerable attention has been given to the polarity of words
(positive and negative) and the creation of large polarity lexicons, research
in emotion analysis has had to rely on limited and small emotion lexicons. In
this paper we show how the combined strength and wisdom of the crowds can be
used to generate a large, high-quality, word-emotion and word-polarity
association lexicon quickly and inexpensively. We enumerate the challenges in
emotion annotation in a crowdsourcing scenario and propose solutions to address
them. Most notably, in addition to questions about emotions associated with
terms, we show how the inclusion of a word choice question can discourage
malicious data entry, help identify instances where the annotator may not be
familiar with the target term (allowing us to reject such annotations), and
help obtain annotations at sense level (rather than at word level). We
conducted experiments on how to formulate the emotion-annotation questions, and
show that asking if a term is associated with an emotion leads to markedly
higher inter-annotator agreement than that obtained by asking if a term evokes
an emotion
Cross-domain sentiment classification using a sentiment sensitive thesaurus
Automatic classification of sentiment is important for numerous applications such as opinion mining, opinion summarization, contextual advertising, and market analysis. However, sentiment is expressed differently in different domains, and annotating corpora for every possible domain of interest is costly. Applying a sentiment classifier trained using labeled data for a particular domain to classify sentiment of user reviews on a different domain often results in poor performance. We propose a method to overcome this problem in cross-domain sentiment classification. First, we create a sentiment sensitive distributional thesaurus using labeled data for the source domains and unlabeled data for both source and target domains. Sentiment sensitivity is achieved in the thesaurus by incorporating document level sentiment labels in the context vectors used as the basis for measuring the distributional similarity between words. Next, we use the created thesaurus to expand feature vectors during train and test times in a binary classifier. The proposed method significantly outperforms numerous baselines and returns results that are comparable with previously proposed cross-domain sentiment classification methods. We conduct an extensive empirical analysis of the proposed method on single and multi-source domain adaptation, unsupervised and supervised domain adaptation, and numerous similarity measures for creating the sentiment sensitive thesaurus
From Frequency to Meaning: Vector Space Models of Semantics
Computers understand very little of the meaning of human language. This
profoundly limits our ability to give instructions to computers, the ability of
computers to explain their actions to us, and the ability of computers to
analyse and process text. Vector space models (VSMs) of semantics are beginning
to address these limits. This paper surveys the use of VSMs for semantic
processing of text. We organize the literature on VSMs according to the
structure of the matrix in a VSM. There are currently three broad classes of
VSMs, based on term-document, word-context, and pair-pattern matrices, yielding
three classes of applications. We survey a broad range of applications in these
three categories and we take a detailed look at a specific open source project
in each category. Our goal in this survey is to show the breadth of
applications of VSMs for semantics, to provide a new perspective on VSMs for
those who are already familiar with the area, and to provide pointers into the
literature for those who are less familiar with the field
Building sentiment Lexicons applying graph theory on information from three Norwegian thesauruses
Sentiment lexicons are the most used tool to automatically predict sentiment
in text. To the best of our knowledge, there exist no openly available
sentiment lexicons for the Norwegian language. Thus in this paper we
applied two different strategies to automatically generate sentiment lexicons
for the Norwegian language. The first strategy used machine translation to
translate an English sentiment lexicon to Norwegian and the other strategy
used information from three different thesauruses to build several sentiment
lexicons. The lexicons based on thesauruses were built using the Label
propagation algorithm from graph theory. The lexicons were evaluated
by classifying product and movie reviews. The results show satisfying
classification performances. Different sentiment lexicons perform well on
product and on movie reviews. Overall the lexicon based on machine
translation performed the best, showing that linguistic resources in English
can be translated to Norwegian without losing significant value
Automatically generating a sentiment lexicon for the Malay language
This paper aims to propose an automated sentiment lexicon generation model specifically designed for the Malay
language. Lexicon-based Sentiment Analysis (SA) models make use of a sentiment lexicon for SA tasks, which is
a linguistic resource that comprises a priori information about the sentiment properties of words. A sentiment
lexicon is an indispensable resource for SA tasks. This is evident in the emergence of a large volume of research
focused on the development of sentiment lexicon generation algorithms. This is not the case for low-resource
languages such as Malay, for which there is a lack of research focused on this particular area. This has brought up
the motivation to propose a sentiment lexicon generation algorithm for this language. WordNet Bahasa was first
mapped onto the English WordNet to construct a multilingual word network. A seed set of prototypical positive
and negative terms was then automatically expanded by recursively adding terms linked via WordNet’s synonymy
and antonymy semantic relations. The underlying intuition is that the sentiment properties of newly added terms
via these relations are preserved. A supervised classifier was employed for the word-polarity tagging task, with
textual representations of the expanded seed set as features. Evaluation of the model against the General Inquirer
lexicon as a benchmark demonstrates that it performs with reasonable accuracy. This paper aims to provide a
foundation for further research for the Malay language in this area
Similarity of Semantic Relations
There are at least two kinds of similarity. Relational similarity is
correspondence between relations, in contrast with attributional similarity,
which is correspondence between attributes. When two words have a high
degree of attributional similarity, we call them synonyms. When two pairs
of words have a high degree of relational similarity, we say that their
relations are analogous. For example, the word pair mason:stone is analogous
to the pair carpenter:wood. This paper introduces Latent Relational Analysis (LRA),
a method for measuring relational similarity. LRA has potential applications in many
areas, including information extraction, word sense disambiguation,
and information retrieval. Recently the Vector Space Model (VSM) of information
retrieval has been adapted to measuring relational similarity,
achieving a score of 47% on a collection of 374 college-level multiple-choice
word analogy questions. In the VSM approach, the relation between a pair of words is
characterized by a vector of frequencies of predefined patterns in a large corpus.
LRA extends the VSM approach in three ways: (1) the patterns are derived automatically
from the corpus, (2) the Singular Value Decomposition (SVD) is used to smooth the frequency
data, and (3) automatically generated synonyms are used to explore variations of the
word pairs. LRA achieves 56% on the 374 analogy questions, statistically equivalent to the
average human score of 57%. On the related problem of classifying semantic relations, LRA
achieves similar gains over the VSM
- …