636 research outputs found

    Automatically generating a sentiment lexicon for the Malay language

    Get PDF
    This paper aims to propose an automated sentiment lexicon generation model specifically designed for the Malay language. Lexicon-based Sentiment Analysis (SA) models make use of a sentiment lexicon for SA tasks, which is a linguistic resource that comprises a priori information about the sentiment properties of words. A sentiment lexicon is an indispensable resource for SA tasks. This is evident in the emergence of a large volume of research focused on the development of sentiment lexicon generation algorithms. This is not the case for low-resource languages such as Malay, for which there is a lack of research focused on this particular area. This has brought up the motivation to propose a sentiment lexicon generation algorithm for this language. WordNet Bahasa was first mapped onto the English WordNet to construct a multilingual word network. A seed set of prototypical positive and negative terms was then automatically expanded by recursively adding terms linked via WordNet’s synonymy and antonymy semantic relations. The underlying intuition is that the sentiment properties of newly added terms via these relations are preserved. A supervised classifier was employed for the word-polarity tagging task, with textual representations of the expanded seed set as features. Evaluation of the model against the General Inquirer lexicon as a benchmark demonstrates that it performs with reasonable accuracy. This paper aims to provide a foundation for further research for the Malay language in this area

    Opinion mining: Reviewed from word to document level

    Get PDF
    International audienceOpinion mining is one of the most challenging tasks of the field of information retrieval. Research community has been publishing a number of articles on this topic but a significant increase in interest has been observed during the past decade especially after the launch of several online social networks. In this paper, we provide a very detailed overview of the related work of opinion mining. Following features of our review make it stand unique among the works of similar kind: (1) it presents a very different perspective of the opinion mining field by discussing the work on different granularity levels (like word, sentences, and document levels) which is very unique and much required, (2) discussion of the related work in terms of challenges of the field of opinion mining, (3) document level discussion of the related work gives an overview of opinion mining task in blogosphere, one of most popular online social network, and (4) highlights the importance of online social networks for opinion mining task and other related sub-tasks

    Crowdsourcing a Word-Emotion Association Lexicon

    Full text link
    Even though considerable attention has been given to the polarity of words (positive and negative) and the creation of large polarity lexicons, research in emotion analysis has had to rely on limited and small emotion lexicons. In this paper we show how the combined strength and wisdom of the crowds can be used to generate a large, high-quality, word-emotion and word-polarity association lexicon quickly and inexpensively. We enumerate the challenges in emotion annotation in a crowdsourcing scenario and propose solutions to address them. Most notably, in addition to questions about emotions associated with terms, we show how the inclusion of a word choice question can discourage malicious data entry, help identify instances where the annotator may not be familiar with the target term (allowing us to reject such annotations), and help obtain annotations at sense level (rather than at word level). We conducted experiments on how to formulate the emotion-annotation questions, and show that asking if a term is associated with an emotion leads to markedly higher inter-annotator agreement than that obtained by asking if a term evokes an emotion

    Twitter Sentiment Mining: A Multi Domain Analysis

    Get PDF
    Microblogging such as Twitter provides a rich source of information about products, personalities, and trends, etc. We proposed a simple methodology for analyzing sentiment of users in Twitter. First, we automatically collected Twitter corpus in positive and negative tweets. Second, we built a simple sentiment classifier by utilizing the Naive Bayes model to determine the positive and negative sentiment of a tweet. Third, we tested the classifier against a collection of users’ opinions from five interesting domains of Twitter, i.e., news, finance, job, movies, and sport. The experimental results show that it is feasible to use Twitter corpus alone to classify new tweet for a certain domain applications

    Automatic domain ontology extraction for context-sensitive opinion mining

    Get PDF
    Automated analysis of the sentiments presented in online consumer feedbacks can facilitate both organizations’ business strategy development and individual consumers’ comparison shopping. Nevertheless, existing opinion mining methods either adopt a context-free sentiment classification approach or rely on a large number of manually annotated training examples to perform context sensitive sentiment classification. Guided by the design science research methodology, we illustrate the design, development, and evaluation of a novel fuzzy domain ontology based contextsensitive opinion mining system. Our novel ontology extraction mechanism underpinned by a variant of Kullback-Leibler divergence can automatically acquire contextual sentiment knowledge across various product domains to improve the sentiment analysis processes. Evaluated based on a benchmark dataset and real consumer reviews collected from Amazon.com, our system shows remarkable performance improvement over the context-free baseline

    SimLex-999: Evaluating Semantic Models with (Genuine) Similarity Estimation

    Full text link
    We present SimLex-999, a gold standard resource for evaluating distributional semantic models that improves on existing resources in several important ways. First, in contrast to gold standards such as WordSim-353 and MEN, it explicitly quantifies similarity rather than association or relatedness, so that pairs of entities that are associated but not actually similar [Freud, psychology] have a low rating. We show that, via this focus on similarity, SimLex-999 incentivizes the development of models with a different, and arguably wider range of applications than those which reflect conceptual association. Second, SimLex-999 contains a range of concrete and abstract adjective, noun and verb pairs, together with an independent rating of concreteness and (free) association strength for each pair. This diversity enables fine-grained analyses of the performance of models on concepts of different types, and consequently greater insight into how architectures can be improved. Further, unlike existing gold standard evaluations, for which automatic approaches have reached or surpassed the inter-annotator agreement ceiling, state-of-the-art models perform well below this ceiling on SimLex-999. There is therefore plenty of scope for SimLex-999 to quantify future improvements to distributional semantic models, guiding the development of the next generation of representation-learning architectures
    • …
    corecore