841 research outputs found

    Suspended accounts: A source of Tweets with disgust and anger emotions for augmenting hate speech data sample

    Get PDF
    In this paper we present a proposal to address the problem of the pricey and unreliable human annotation, which is important for detection of hate speech from the web contents. In particular, we propose to use the text that are produced from the suspended accounts in the aftermath of a hateful event as subtle and reliable source for hate speech prediction. The proposal was motivated after implementing emotion analysis on three sources of data sets: suspended, active and neutral ones, i.e. the first two sources of data sets contain hateful tweets from suspended accounts and active accounts, respectively, whereas the third source of data sets contain neutral tweets only. The emotion analysis indicated that the tweets from suspended accounts show more disgust, negative, fear and sadness emotions than the ones from active accounts, although tweets from both types of accounts might be annotated as hateful ones by human annotators. We train two Random Forest classifiers based on the semantic meaning of tweets respectively from suspended and active accounts, and evaluate the prediction accuracy of the two classifiers on unseen data. The results show that the classifier trained on the tweets from suspended accounts outperformed the one trained on the tweets from active accounts by 16% of overall F-score

    Understanding the Effect of Deplatforming on Social Networks

    Get PDF
    Aiming to enhance the safety of their users, social media platforms enforce terms of service by performing active moderation, including removing content or suspending users. Nevertheless, we do not have a clear understanding of how effective it is, ultimately, to suspend users who engage in toxic behavior, as that might actually draw users to alternative platforms where moderation is laxer. Moreover, this deplatforming efforts might end up nudging abusive users towards more extreme ideologies and potential radicalization risks. In this paper, we set to understand what happens when users get suspended on a social platform and move to an alternative one. We focus on accounts active on Gab that were suspended from Twitter and Reddit. We develop a method to identify accounts belonging to the same person on these platforms, and observe whether there was a measurable difference in the activity and toxicity of these accounts after suspension. We find that users who get banned on Twitter/Reddit exhibit an increased level of activity and toxicity on Gab, although the audience they potentially reach decreases. Overall, we argue that moderation efforts should go beyond ensuring the safety of users on a single platform, taking into account the potential adverse effects of banning users on major platforms

    Discovering and Mitigating Social Data Bias

    Get PDF
    abstract: Exabytes of data are created online every day. This deluge of data is no more apparent than it is on social media. Naturally, finding ways to leverage this unprecedented source of human information is an active area of research. Social media platforms have become laboratories for conducting experiments about people at scales thought unimaginable only a few years ago. Researchers and practitioners use social media to extract actionable patterns such as where aid should be distributed in a crisis. However, the validity of these patterns relies on having a representative dataset. As this dissertation shows, the data collected from social media is seldom representative of the activity of the site itself, and less so of human activity. This means that the results of many studies are limited by the quality of data they collect. The finding that social media data is biased inspires the main challenge addressed by this thesis. I introduce three sets of methodologies to correct for bias. First, I design methods to deal with data collection bias. I offer a methodology which can find bias within a social media dataset. This methodology works by comparing the collected data with other sources to find bias in a stream. The dissertation also outlines a data collection strategy which minimizes the amount of bias that will appear in a given dataset. It introduces a crawling strategy which mitigates the amount of bias in the resulting dataset. Second, I introduce a methodology to identify bots and shills within a social media dataset. This directly addresses the concern that the users of a social media site are not representative. Applying these methodologies allows the population under study on a social media site to better match that of the real world. Finally, the dissertation discusses perceptual biases, explains how they affect analysis, and introduces computational approaches to mitigate them. The results of the dissertation allow for the discovery and removal of different levels of bias within a social media dataset. This has important implications for social media mining, namely that the behavioral patterns and insights extracted from social media will be more representative of the populations under study.Dissertation/ThesisDoctoral Dissertation Computer Science 201

    Stance characterization and detection on social media

    Get PDF
    Stance detection refers to the task of identifying a viewpoint as either supporting or opposing a given topic. The current research on socio-political opinion mining on social media is still in its infancy. Most computational approaches in this field are limited to the independent use of textual elements of a user’s posts from social factors such as homophily and network structure. This thesis provides a thorough study of stance detection on social media and assesses various online signals to identify the stance and understand its association with the analysed topic. We explore the task of detecting stance on Twitter, which is a well-known social media platform where people often express stance implicitly or explicitly. First, we examine the relation between sentiment and stance and analyse the inter-play between sentiment polarity and expressed stance. For this purpose, we extend the current SemEval stance dataset by annotating tweets related to four new topics with sentiment and stance labels. Then, we evaluate the effectiveness of sentiment analysis methods on stance prediction using two stance datasets. Second, we examine the multi-modal representation of stance on social media by evaluating multiple stance detection models using textual content and online interactions. The finding of this chapter suggests that using social interactions along with other textual features can improve the stance detection model. Moreover, we show how an unconscious social interaction can reveal the stance. Next, we design an online framework to preserve users’ privacy concerning the implicitly inferred stance on social media. Thus, we evaluate the effectiveness of the two stance obfuscation methods and use different stance detection models to measure the overall performance of the proposed framework. Finally, we study the dynamics of polarized stance to understand the factors that influence online stance. Particularly, we extend the analysis of online stance signals and examine the interplay between stance and automated accounts (bots). Furthermore, we pose the problem of gauging the bots’ effect on polarized stance through a sole focus on the diffusion of bots on the online social network

    Predictive Analysis on Twitter: Techniques and Applications

    Full text link
    Predictive analysis of social media data has attracted considerable attention from the research community as well as the business world because of the essential and actionable information it can provide. Over the years, extensive experimentation and analysis for insights have been carried out using Twitter data in various domains such as healthcare, public health, politics, social sciences, and demographics. In this chapter, we discuss techniques, approaches and state-of-the-art applications of predictive analysis of Twitter data. Specifically, we present fine-grained analysis involving aspects such as sentiment, emotion, and the use of domain knowledge in the coarse-grained analysis of Twitter data for making decisions and taking actions, and relate a few success stories

    Message Deletion on Telegram: Affected Data Types and Implications for Computational Analysis

    Get PDF
    Ephemeral digital trace data can decrease the completeness, reproducibility, and reliability of social media datasets. Systematic post deletions thus potentially bias the results of computational methods used to map actors, content, and online information diffusion. Therefore, the aim of this study was to assess the extent and distribution of message deletion across different data types using data from the hybrid messenger service Telegram, which has experienced an influx of deplatformed users from mainstream social media platforms. A repeatedly scraped sample of messages from public Telegram groups and channels was used to investigate the effect of message ephemerality on the consistency of Telegram datasets. The findings revealed that message deletion introduces biases to the computational collection and analysis of Telegram data. Further, message ephemerality reduces dataset consistency, the quality of social network analyses, and the results of computational content analysis methods, such as topic modeling or dictionaries. The implications of these findings for scholars aiming to use Telegram data for computational research, possible solutions, and contributions to the methodological advancement of studying online political communication are discussed further in this article

    Trajectories of Blocked Community Members: Redemption, Recidivism and Departure

    Full text link
    Community norm violations can impair constructive communication and collaboration online. As a defense mechanism, community moderators often address such transgressions by temporarily blocking the perpetrator. Such actions, however, come with the cost of potentially alienating community members. Given this tradeoff, it is essential to understand to what extent, and in which situations, this common moderation practice is effective in reinforcing community rules. In this work, we introduce a computational framework for studying the future behavior of blocked users on Wikipedia. After their block expires, they can take several distinct paths: they can reform and adhere to the rules, but they can also recidivate, or straight-out abandon the community. We reveal that these trajectories are tied to factors rooted both in the characteristics of the blocked individual and in whether they perceived the block to be fair and justified. Based on these insights, we formulate a series of prediction tasks aiming to determine which of these paths a user is likely to take after being blocked for their first offense, and demonstrate the feasibility of these new tasks. Overall, this work builds towards a more nuanced approach to moderation by highlighting the tradeoffs that are in play.Comment: To appear in Proceedings of the 2019 World Wide Web Conference (WWW '19), May 13-17, 2019, San Francisco, CA, USA. Code and data available as part of ConvoKit: convokit.cornell.ed
    • …
    corecore