267 research outputs found

    Crowdsourcing a Word-Emotion Association Lexicon

    Full text link
    Even though considerable attention has been given to the polarity of words (positive and negative) and the creation of large polarity lexicons, research in emotion analysis has had to rely on limited and small emotion lexicons. In this paper we show how the combined strength and wisdom of the crowds can be used to generate a large, high-quality, word-emotion and word-polarity association lexicon quickly and inexpensively. We enumerate the challenges in emotion annotation in a crowdsourcing scenario and propose solutions to address them. Most notably, in addition to questions about emotions associated with terms, we show how the inclusion of a word choice question can discourage malicious data entry, help identify instances where the annotator may not be familiar with the target term (allowing us to reject such annotations), and help obtain annotations at sense level (rather than at word level). We conducted experiments on how to formulate the emotion-annotation questions, and show that asking if a term is associated with an emotion leads to markedly higher inter-annotator agreement than that obtained by asking if a term evokes an emotion

    MS-TR: A Morphologically Enriched Sentiment Treebank and Recursive Deep Models for Compositional Semantics in Turkish

    Get PDF
    Recursive Deep Models have been used as powerful models to learn compositional representations of text for many natural language processing tasks. However, they require structured input (i.e. sentiment treebank) to encode sentences based on their tree-based structure to enable them to learn latent semantics of words using recursive composition functions. In this paper, we present our contributions and efforts for the Turkish Sentiment Treebank construction. We introduce MS-TR, a Morphologically Enriched Sentiment Treebank, which was implemented for training Recursive Deep Models to address compositional sentiment analysis for Turkish, which is one of the well-known Morphologically Rich Language (MRL). We propose a semi-supervised automatic annotation, as a distantsupervision approach, using morphological features of words to infer the polarity of the inner nodes of MS-TR as positive and negative. The proposed annotation model has four different annotation levels: morph-level, stem-level, token-level, and review-level. Each annotation level’s contribution was tested using three different domain datasets, including product reviews, movie reviews, and the Turkish Natural Corpus essays. Comparative results were obtained with the Recursive Neural Tensor Networks (RNTN) model which is operated over MS-TR, and conventional machine learning methods. Experiments proved that RNTN outperformed the baseline methods and achieved much better accuracy results compared to the baseline methods, which cannot accurately capture the aggregated sentiment information

    Acquiring Broad Commonsense Knowledge for Sentiment Analysis Using Human Computation

    Get PDF
    While artificial intelligence is successful in many applications that cover specific domains, for many commonsense problems there is still a large gap with human performance. Automated sentiment analysis is a typical example: while there are techniques that reasonably aggregate sentiments from texts in specific domains, such as online reviews of a particular product category, more general models have a poor performance. We argue that sentiment analysis can be covered more broadly by extending models with commonsense knowledge acquired at scale, using human computation. We study two sentiment analysis problems. We start with document-level sentiment classification, which aims to determine whether a text as a whole expresses a positive or a negative sentiment. We hypothesize that extending classifiers to include the polarities of sentiment words in context can help them scale to broad domains. We also study fine-grained opinion extraction, which aims to pinpoint individual opinions in a text, along with their targets. We hypothesize that extraction models can benefit from broad fine-grained annotations to boost their performance on unfamiliar domains. Selecting sentiment words in context and annotating texts with opinions and targets are tasks that require commonsense knowledge shared by all the speakers of a language. We show how these can be effectively solved through human computation. We illustrate how to define small tasks that can be solved by many independent workers so that results can form a single coherent knowledge base. We also show how to recruit, train, and engage workers, then how to perform effective quality control to obtain sufficiently high-quality knowledge. We show how the resulting knowledge can be effectively integrated into models that scale to broad domains and also perform well in unfamiliar domains. We engage workers through both enjoyment and payment, by designing our tasks as games played for money. We recruit them on a paid crowdsourcing platform where we can reach out to a large pool of active workers. This is an effective recipe for acquiring sentiment knowledge in English, a language that is known by the vast majority of workers on the platform. To acquire sentiment knowledge for other languages, which have received comparatively little attention, we argue that we need to design tasks that appeal to voluntary workers outside the crowdsourcing platform, based on enjoyment alone. However, recruiting and engaging volunteers has been more of an art than a problem that can be solved systematically. We show that combining online advertisement with games, an approach that has been recently proved to work well for acquiring expert knowledge, gives an effective recipe for luring and engaging volunteers to provide good quality sentiment knowledge for texts in French. Our solutions could point the way to how to use human computation to broaden the competence of artificial intelligence systems in other domains as well

    Strategic sentiments and emotions in post-Second World War party manifestos in Finland

    Get PDF
    We contribute to the growing number of studies on emotions and politics by investigating how political parties strategically use sentiments and emotions in party manifestos. We use computational methods in examining changes of sentiments and emotions in Finnish party manifestos from 1945 to 2019. We use sentiment and emotion lexicons first translated from English into Finnish and then modified for the purposes of our study. We analyze how the use of emotions and sentiments differs between government and opposition parties depending on their left/right ideology and the specific type of party manifesto. In addition to traditional sentiment and emotion analysis, we use emotion intensity analysis. Our results indicate that in Finland, government and opposition parties do not differ substantially from each other in their use of emotional language. From a historical perspective, the individual emotions used in party manifestos have persisted, but changes have taken place in the intensity of using emotion words. We also find that in comparison with other parties, populist parties both appeal to different emotions and appeal to the same emotions with different intensities.Peer reviewe

    Sentiment, Emotion, Purpose, and Style in Electoral Tweets

    Get PDF
    Abstract Social media is playing a growing role in elections world-wide. Thus, automatically analyzing electoral tweets has applications in understanding how public sentiment is shaped, tracking public sentiment and polarization with respect to candidates and issues, understanding the impact of tweets from various entities, etc. Here, for the first time, we automatically annotate a set of 2012 US presidential election tweets for a number of attributes pertaining to sentiment, emotion, purpose, and style by crowdsourcing. Overall, more than 100,000 crowdsourced responses were obtained for 13 questions on emotions, style, and purpose. Additionally, we show through an analysis of these annotations that purpose, even though correlated with emotions, is significantly different. Finally, we describe how we developed automatic classifiers, using features from state-of-the-art sentiment analysis systems, to predict emotion and purpose labels, respectively, in new unseen tweets. These experiments establish baseline results for automatic systems on this new data

    Online Gaming for Crowd-sourcing Phrase-equivalents

    Get PDF
    We propose the use of a game with a purpose (GWAP) to facilitate crowd-sourcing of phrase-equivalents, as an alternative to expert or paid crowd-sourcing. Doodling is an online multiplayer game, in which one player (drawer), draws pictures on a shared board to get the other players (guessers) to guess the meaning behind an assigned phrase. In this paper we describe the system and results from several experiments intended to improve the quality of information generated by the play. In addition, we describe the mechanism by which we take candidate phrases generated during the games and filter out true phrase equivalents. We expect that, at scale, this game will be more cost-efficient than paid mechanisms for a similar task, and demonstrate this by comparing the productivity of an hour of game play to an equivalent crowd-sourced Amazon Mechanical Turk task to produce phrase-equivalents over one week

    Annotate-Sample-Average (ASA): A New Distant Supervision Approach for Twitter Sentiment Analysis

    Get PDF
    The classification of tweets into polarity classes is a popular task in sentiment analysis. State-of-the-art solutions to this problem are based on supervised machine learning models trained from manually annotated examples. A drawback of these approaches is the high cost involved in data annotation. Two freely available resources that can be exploited to solve the problem are: 1) large amounts of unlabelled tweets obtained from the Twitter API and 2) prior lexical knowledge in the form of opinion lexicons. In this paper, we propose Annotate-Sample-Average (ASA), a distant supervision method that uses these two resources to generate synthetic training data for Twitter polarity classification. Positive and negative training instances are generated by sampling and averaging unlabelled tweets containing words with the corresponding polarity. Polarity of words is determined from a given polarity lexicon. Our experimental results show that the training data generated by ASA (after tuning its parameters) produces a classifier that performs significantly better than a classifier trained from tweets annotated with emoticons and a classifier trained, without any sampling and averaging, from tweets annotated according to the polarity of their words

    Crowdsourcing Research Opportunities: Lessons from Natural Language Processing

    Get PDF
    Although the field has led to promising early results, the use of crowdsourcing as an integral part of science projects is still regarded with skepticism by some, largely due to a lack of awareness of the opportunities and implications of utilizing these new techniques. We address this lack of awareness, firstly by highlighting the positive impacts that crowdsourcing has had on Natural Language Processing research. Secondly, we discuss the challenges of more complex methodologies, quality control, and the necessity to deal with ethical issues. We conclude with future trends and opportunities of crowdsourcing for science, including its potential for disseminating results, making science more accessible, and enriching educational programs
    corecore