2 research outputs found

    Dystemo: Distant Supervision Method for Multi-Category Emotion Recognition in Tweets

    Get PDF
    Emotion recognition in text has become an important research objective. It involves building classifiers capable of detecting human emotions for a specific application, for example, analyzing reactions to product launches, monitoring emotions at sports events, or discerning opinions in political debates. Most successful approaches rely heavily on costly manual annotation. To alleviate this burden, we propose a distant supervision method-Dystemo-for automatically producing emotion classifiers from tweets labeled using existing or easy-to-produce emotion lexicons. The goal is to obtain emotion classifiers that work more accurately for specific applications than available emotion lexicons. The success of this method depends mainly on a novel classifier-Balanced Weighted Voting (BWV)-designed to overcome the imbalance in emotion distribution in the initial dataset, and on novel heuristics for detecting neutral tweets. We demonstrate how Dystemo works using Twitter data about sports events, a fine-grained 20-category emotion model, and three different initial emotion lexicons. Through a series of carefully designed experiments, we confirm that Dystemo is effective both in extending initial emotion lexicons of small coverage to find correctly more emotional tweets and in correcting emotion lexicons of low accuracy to perform more accurately

    Advancing Fine-Grained Emotion Recognition in Short Text

    Get PDF
    Advanced emotion recognition in text is essential for developing intelligent affective applications, which can recognize, react upon, and analyze users' emotions. Our particular motivation for solving this problem lies in large-scale analysis of social media data, such as those generated by Twitter users. Summarizing users' emotions can enable better understandings of their reactions, interests, and motivations. We thus narrow the problem to emotion recognition in short text, particularly tweets. Another driving factor of our work is to enable discovering emotional experiences at a detailed, fine-grained level. While many researchers focus on recognizing a small number of basic emotion categories, humans experience a larger variety of distinct emotions. We aim to recognize as many as 20 emotion categories from the Geneva Emotion Wheel. Our goal is to study how to build such fine-grained emotion recognition systems. We start by surveying prior approaches to building emotion classifiers. The main body of this thesis studies two of them in detail: crowdsourcing and distant supervision. Based on them, we design fine-grained domain-specific systems to recognize users' reactions to sporting events captured on Twitter and address multiple challenges that arise in that process. Crowdsourcing allows extracting affective commonsense knowledge by asking hundreds of workers for manual annotation. The challenge is in collecting informative and truthful annotations. To address it, we design a human computation task that elicits both emotion category labels and emotion indicators (i.e. words or phrases indicative of labeled emotions). We also develop a methodology to build an emotion lexicon using such data. Our experiments show that the proposed crowdsourcing method can successfully generate a domain-specific emotion lexicon. Additionally, we suggest how to teach and motivate non-expert annotators. We show that including a tutorial and using carefully formulated reward descriptions can effectively improve annotation quality. Distant supervision consists of building emotion classifiers from data that are automatically labeled using some heuristics. This thesis studies heuristics that apply emotion lexicons of limited quality, for example due to missing or erroneous term-emotion associations. We show the viability of such an approach to obtain domain-specific classifiers having substantially better quality of recognition than the initial lexicon-based ones. Our experiments reveal that treating the emotion imbalance in training data and incorporating pseudo-neutral documents is crucial for such improvement. This method can be applied to building emotion classifiers across different domains using limited input resources and thus requiring minimal effort. Another challenge for lexicon-based emotion recognition is to reduce the error introduced by linguistic modifiers such as negation and modality. We design a data analysis method that allows modeling the specific effects of the studied modifiers, both in terms of shifting emotion categories and changing confidence in emotion presence. We show that the effects of modifiers vary across the emotion categories, which indicates the importance of treating such effects at a more fine-grained level to improve classification quality. Finally, the thesis concludes with our recommendations on how to address the examined general challenges of building a fine-grained textual emotion recognition system
    corecore