636 research outputs found
Automatically generating a sentiment lexicon for the Malay language
This paper aims to propose an automated sentiment lexicon generation model specifically designed for the Malay
language. Lexicon-based Sentiment Analysis (SA) models make use of a sentiment lexicon for SA tasks, which is
a linguistic resource that comprises a priori information about the sentiment properties of words. A sentiment
lexicon is an indispensable resource for SA tasks. This is evident in the emergence of a large volume of research
focused on the development of sentiment lexicon generation algorithms. This is not the case for low-resource
languages such as Malay, for which there is a lack of research focused on this particular area. This has brought up
the motivation to propose a sentiment lexicon generation algorithm for this language. WordNet Bahasa was first
mapped onto the English WordNet to construct a multilingual word network. A seed set of prototypical positive
and negative terms was then automatically expanded by recursively adding terms linked via WordNet’s synonymy
and antonymy semantic relations. The underlying intuition is that the sentiment properties of newly added terms
via these relations are preserved. A supervised classifier was employed for the word-polarity tagging task, with
textual representations of the expanded seed set as features. Evaluation of the model against the General Inquirer
lexicon as a benchmark demonstrates that it performs with reasonable accuracy. This paper aims to provide a
foundation for further research for the Malay language in this area
Opinion mining: Reviewed from word to document level
International audienceOpinion mining is one of the most challenging tasks of the field of information retrieval. Research community has been publishing a number of articles on this topic but a significant increase in interest has been observed during the past decade especially after the launch of several online social networks. In this paper, we provide a very detailed overview of the related work of opinion mining. Following features of our review make it stand unique among the works of similar kind: (1) it presents a very different perspective of the opinion mining field by discussing the work on different granularity levels (like word, sentences, and document levels) which is very unique and much required, (2) discussion of the related work in terms of challenges of the field of opinion mining, (3) document level discussion of the related work gives an overview of opinion mining task in blogosphere, one of most popular online social network, and (4) highlights the importance of online social networks for opinion mining task and other related sub-tasks
Crowdsourcing a Word-Emotion Association Lexicon
Even though considerable attention has been given to the polarity of words
(positive and negative) and the creation of large polarity lexicons, research
in emotion analysis has had to rely on limited and small emotion lexicons. In
this paper we show how the combined strength and wisdom of the crowds can be
used to generate a large, high-quality, word-emotion and word-polarity
association lexicon quickly and inexpensively. We enumerate the challenges in
emotion annotation in a crowdsourcing scenario and propose solutions to address
them. Most notably, in addition to questions about emotions associated with
terms, we show how the inclusion of a word choice question can discourage
malicious data entry, help identify instances where the annotator may not be
familiar with the target term (allowing us to reject such annotations), and
help obtain annotations at sense level (rather than at word level). We
conducted experiments on how to formulate the emotion-annotation questions, and
show that asking if a term is associated with an emotion leads to markedly
higher inter-annotator agreement than that obtained by asking if a term evokes
an emotion
Recommended from our members
OBOME - Ontology based opinion mining in UBIPOL
Ontologies have a special role in the UBIPOL system, they help to structure the policy related context, provide conceptualization for policy domain and use in the opinion mining process. In this work we presented a system called Ontology Based Opinion Mining Engine (OBOME) for analyzing a domain-specific opinion corpus by first assisting the user with the creation of a domain ontology from the corpus. We determined the polarity of opinion on the various domain aspects. In the former step, the policy domain aspect has are identified (namely which policy category is represented by the concept). This identification is supported by the policy modelling ontology, which describe the most important policy – related classes and structure. Then the most informative documents from the corpus are extracted and asked the user to create a set of aspects and related keywords using these documents. In the latter step, we used the corpus specific ontology to model the domain and extracted aspect-polarity associations using grammatical dependencies between words. Later, summarized results are shown to the user to analyze and store. Finally, in an offline process policy modeling ontology is updated
Twitter Sentiment Mining: A Multi Domain Analysis
Microblogging such as Twitter provides a rich source of information about products, personalities, and trends, etc. We proposed a simple methodology for analyzing sentiment of users in Twitter. First, we automatically collected Twitter corpus in positive and negative tweets. Second, we built a simple sentiment classifier by utilizing the Naive Bayes model to determine the positive and negative sentiment of a tweet. Third, we tested the classifier against a collection of users’ opinions from five interesting domains of Twitter, i.e., news, finance, job, movies, and sport. The experimental results show that it is feasible to use Twitter corpus alone to classify new tweet for a certain domain applications
Automatic domain ontology extraction for context-sensitive opinion mining
Automated analysis of the sentiments presented in online consumer feedbacks can facilitate both organizations’ business strategy development and individual consumers’ comparison shopping. Nevertheless, existing opinion mining methods either adopt a context-free sentiment classification approach or rely on a large number of manually annotated training examples to perform context sensitive sentiment classification. Guided by the design science research methodology, we illustrate the design, development, and evaluation of a novel fuzzy domain ontology based contextsensitive opinion mining system. Our novel ontology extraction mechanism underpinned by a variant of Kullback-Leibler divergence can automatically acquire contextual sentiment knowledge across various product domains to improve the sentiment analysis processes. Evaluated based on a benchmark dataset and real consumer reviews collected from Amazon.com, our system shows remarkable performance improvement over the context-free baseline
SimLex-999: Evaluating Semantic Models with (Genuine) Similarity Estimation
We present SimLex-999, a gold standard resource for evaluating distributional
semantic models that improves on existing resources in several important ways.
First, in contrast to gold standards such as WordSim-353 and MEN, it explicitly
quantifies similarity rather than association or relatedness, so that pairs of
entities that are associated but not actually similar [Freud, psychology] have
a low rating. We show that, via this focus on similarity, SimLex-999
incentivizes the development of models with a different, and arguably wider
range of applications than those which reflect conceptual association. Second,
SimLex-999 contains a range of concrete and abstract adjective, noun and verb
pairs, together with an independent rating of concreteness and (free)
association strength for each pair. This diversity enables fine-grained
analyses of the performance of models on concepts of different types, and
consequently greater insight into how architectures can be improved. Further,
unlike existing gold standard evaluations, for which automatic approaches have
reached or surpassed the inter-annotator agreement ceiling, state-of-the-art
models perform well below this ceiling on SimLex-999. There is therefore plenty
of scope for SimLex-999 to quantify future improvements to distributional
semantic models, guiding the development of the next generation of
representation-learning architectures
- …