29,953 research outputs found

    ConStance: Modeling Annotation Contexts to Improve Stance Classification

    Full text link
    Manual annotations are a prerequisite for many applications of machine learning. However, weaknesses in the annotation process itself are easy to overlook. In particular, scholars often choose what information to give to annotators without examining these decisions empirically. For subjective tasks such as sentiment analysis, sarcasm, and stance detection, such choices can impact results. Here, for the task of political stance detection on Twitter, we show that providing too little context can result in noisy and uncertain annotations, whereas providing too strong a context may cause it to outweigh other signals. To characterize and reduce these biases, we develop ConStance, a general model for reasoning about annotations across information conditions. Given conflicting labels produced by multiple annotators seeing the same instances with different contexts, ConStance simultaneously estimates gold standard labels and also learns a classifier for new instances. We show that the classifier learned by ConStance outperforms a variety of baselines at predicting political stance, while the model's interpretable parameters shed light on the effects of each context.Comment: To appear at EMNLP 201

    An emotional mess! Deciding on a framework for building a Dutch emotion-annotated corpus

    Get PDF
    Seeing the myriad of existing emotion models, with the categorical versus dimensional opposition the most important dividing line, building an emotion-annotated corpus requires some well thought-out strategies concerning framework choice. In our work on automatic emotion detection in Dutch texts, we investigate this problem by means of two case studies. We find that the labels joy, love, anger, sadness and fear are well-suited to annotate texts coming from various domains and topics, but that the connotation of the labels strongly depends on the origin of the texts. Moreover, it seems that information is lost when an emotional state is forcedly classified in a limited set of categories, indicating that a bi-representational format is desirable when creating an emotion corpus.Seeing the myriad of existing emotion models, with the categorical versus dimensional opposition the most important dividing line, building an emotion-annotated corpus requires some well thought-out strategies concerning framework choice. In our work on automatic emotion detection in Dutch texts, we investigate this problem by means of two case studies. We find that the labels joy, love, anger, sadness and fear are well-suited to annotate texts coming from various domains and topics, but that the connotation of the labels strongly depends on the origin of the texts. Moreover, it seems that information is lost when an emotional state is forcedly classified in a limited set of categories, indicating that a bi-representational format is desirable when creating an emotion corpus.P

    Towards reproducible research of event detection techniques for Twitter

    Get PDF
    © 2019 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works

    To follow or not to follow? How Belgian health journalists use Twitter to monitor potential sources

    Get PDF
    Digital technology, the internet and mobile media are transforming the journalism and media landscape by influencing the news gathering and sourcing process. The empowering capacities of social media applications may constitute a key element for more balanced news access and “inclusive journalism”. We will build on two contrasting views that dominate the social media sourcing debate. On the one hand, literature shows that journalists of legacy media make use of social media sources to diversify their sourcing network including bottom-up sources such as ordinary citizens. On the other hand, various authors conclude that journalists stick with their old sourcing routines and continue to privilege top-down elite sources such as experts and government officials. In order to contribute to this academic debate we want to clarify the Twitter practices of professional Belgian health journalists in terms of how they use the platform to monitor potential sources. Therefore, we examined the 1146 Twitter “followings” of six Belgian health journalists by means of digital methods and social network analysis. Results show that top-down actors are overrepresented in the “following” networks and that Twitter’s “following” function is not used to reach out to bottom-up actors. In the overall network, we found that the health journalists mainly use Twitter as a “press club” (Rupar, 2015) to monitor media actors. If we zoom in specifically on the “following” network of the health-related actors, we found that media actors are still important, but experts become the most followed group. Our findings also underwrite the “power law” or “long tail” distribution of social network sites as very few actors take a central position in the “following” lists while the large majority of actors are not systematically monitored by the journalists

    Detecting and Tracking the Spread of Astroturf Memes in Microblog Streams

    Full text link
    Online social media are complementing and in some cases replacing person-to-person social interaction and redefining the diffusion of information. In particular, microblogs have become crucial grounds on which public relations, marketing, and political battles are fought. We introduce an extensible framework that will enable the real-time analysis of meme diffusion in social media by mining, visualizing, mapping, classifying, and modeling massive streams of public microblogging events. We describe a Web service that leverages this framework to track political memes in Twitter and help detect astroturfing, smear campaigns, and other misinformation in the context of U.S. political elections. We present some cases of abusive behaviors uncovered by our service. Finally, we discuss promising preliminary results on the detection of suspicious memes via supervised learning based on features extracted from the topology of the diffusion networks, sentiment analysis, and crowdsourced annotations

    Citizens and Institutions as Information Prosumers. The Case Study of Italian Municipalities on Twitter

    Get PDF
    The aim of this paper is to address changes in public communication following the advent of Internet social networking tools and the emerging web 2.0 technologies which are providing new ways of sharing information and knowledge. In particular public administrations are called upon to reinvent the governance of public affairs and to update the means for interacting with their communities. The paper develops an analysis of the distribution, diffusion and performance of the official profiles on Twitter adopted by the Italian municipalities (comuni) up to November 2013. It aims to identify the patterns of spatial distribution and the drivers of the diffusion of Twitter profiles; the performance of the profiles through an aggregated index, called the Twitter performance index (Twiperindex), which evaluates the profiles' activity with reference to the gravitational areas of the municipalities in order to enable comparisons of the activity of municipalities with different demographic sizes and functional roles. The results show that only a small portion of innovative municipalities have adopted Twitter to enhance e-participation and e-governance and that the drivers of the diffusion seem to be related either to past experiences and existing conditions (i.e. civic networks, digital infrastructures) developed over time or to strong local community awareness. The better performances are achieved mainly by small and medium-sized municipalities. Of course, the phenomenon is very new and fluid, therefore this analysis should be considered as a first step in ongoing research which aims to grasp the dynamics of these new means of public communication
    • …
    corecore