841 research outputs found
Suspended accounts: A source of Tweets with disgust and anger emotions for augmenting hate speech data sample
In this paper we present a proposal to address the problem of the pricey and unreliable human annotation, which is important for detection of hate speech from the web contents. In particular, we propose to use the text that are produced from the suspended accounts in the aftermath of a hateful event as subtle and reliable source for hate speech prediction. The proposal was motivated after implementing emotion analysis on three sources of data sets: suspended, active and neutral ones, i.e. the first two sources of data sets contain hateful tweets from suspended accounts and active accounts, respectively, whereas the third source of data sets contain neutral tweets only. The emotion analysis indicated that the tweets from suspended accounts show more disgust, negative, fear and sadness emotions than the ones from active accounts, although tweets from both types of accounts might be annotated as hateful ones by human annotators. We train two Random Forest classifiers based on the semantic meaning of tweets respectively from suspended and active accounts, and evaluate the prediction accuracy of the two classifiers on unseen data. The results show that the classifier trained on the tweets from suspended accounts outperformed the one trained on the tweets from active accounts by 16% of overall F-score
Understanding the Effect of Deplatforming on Social Networks
Aiming to enhance the safety of their users, social media platforms enforce terms of service by performing active moderation, including removing content or suspending users. Nevertheless, we do not have a clear understanding of how effective it is, ultimately, to suspend users who engage in toxic behavior, as that might actually draw users to alternative platforms where moderation is laxer. Moreover, this deplatforming efforts might end up nudging abusive users towards more extreme ideologies and potential radicalization risks. In this paper, we set to understand what happens when users get suspended on a social platform and move to an alternative one. We focus on accounts active on Gab that were suspended from Twitter and Reddit. We develop a method to identify accounts belonging to the same person on these platforms, and observe whether there was a measurable difference in the activity and toxicity of these accounts after suspension. We find that users who get banned on Twitter/Reddit exhibit an increased level of activity and toxicity on Gab, although the audience they potentially reach decreases. Overall, we argue that moderation efforts should go beyond ensuring the safety of users on a single platform, taking into account the potential adverse effects of banning users on major platforms
Recommended from our members
Artificial Intelligence and Online Extremism: Challenges and Opportunities
Radicalisation is a process that historically used to be triggered mainly through social interactions in places of worship, religious schools, prisons, meeting venues, etc. Today, this process is often initiated on the Internet, where radicalisation content is easily shared, and potential candidates are reached more easily, rapidly, and at an unprecedented scale (Edwards and Gribbon, 2013; Von Behr et al., 2013).
In recent years, some terrorist organisations succeeded in leveraging the power of social media to recruit individuals to their cause and ideology (Farwell, 2014). It is often the case that such recruitment attempts are initiated on open social media platforms (e.g., Twitter, Facebook, Tumblr, YouTube) but then move onto private messages and/or encrypted platforms (e.g., WhatsApp, Telegram). Such encrypted communication channels have also been used by terrorist cells and networks to plan their operations (Gartenstein-Ross and Barr).
To counteract the activities of such organisations, and to halt the spread of radicalisation content, some governments, social media platforms, and counter-extremism agencies are investing in the creation of advanced information technologies to identify and counter extremism through the development of Artificial Intelligent (AI) solutions (Correa and Sureka, 2013; Agarwal and Sureka 2015a; Scrivens and Davies, 2018).
These solutions have three main objectives: (i) understanding the phenomena behind online extremism (the communication flow, the use of propaganda, the different stages of the radicalisation process, the variety of radicalisation channels, etc.), (ii) automatically detecting radical users and content, and (iii) predicting the adoption and spreading of extremist ideas.
Despite current advancements in the area, multiple challenges still exist, including: (i) the lack of a common definition of prohibited radical and extremist internet activity, (ii) the lack of solid verification of the datasets collected to develop detection and prediction models, (iii) the lack of cooperation across research fields, since most of the developed technological solutions are neither based on, nor do they take advantage of, existing social theories and studies of radicalisation, (iv) the constant evolution of behaviours associated with online extremism in order to avoid being detected by the developed algorithms (changes in terminology, creation of new accounts, etc.) and, (v) the development of ethical guidelines and legislation to regulate the design and development of AI technology to counter radicalisation.
In this book chapter we provide an overview of the current technological advancements towards addressing the problem of online extremism (with a particular focus on Jihadism). We identify some of the limitations of current technologies, and highlight some of the potential opportunities. Our aim is to reflect on the current state of the art and to stimulate discussions on the future design and development of AI technology to target the problem of online extremism
Discovering and Mitigating Social Data Bias
abstract: Exabytes of data are created online every day. This deluge of data is no more apparent than it is on social media. Naturally, finding ways to leverage this unprecedented source of human information is an active area of research. Social media platforms have become laboratories for conducting experiments about people at scales thought unimaginable only a few years ago.
Researchers and practitioners use social media to extract actionable patterns such as where aid should be distributed in a crisis. However, the validity of these patterns relies on having a representative dataset. As this dissertation shows, the data collected from social media is seldom representative of the activity of the site itself, and less so of human activity. This means that the results of many studies are limited by the quality of data they collect.
The finding that social media data is biased inspires the main challenge addressed by this thesis. I introduce three sets of methodologies to correct for bias. First, I design methods to deal with data collection bias. I offer a methodology which can find bias within a social media dataset. This methodology works by comparing the collected data with other sources to find bias in a stream. The dissertation also outlines a data collection strategy which minimizes the amount of bias that will appear in a given dataset. It introduces a crawling strategy which mitigates the amount of bias in the resulting dataset. Second, I introduce a methodology to identify bots and shills within a social media dataset. This directly addresses the concern that the users of a social media site are not representative. Applying these methodologies allows the population under study on a social media site to better match that of the real world. Finally, the dissertation discusses perceptual biases, explains how they affect analysis, and introduces computational approaches to mitigate them.
The results of the dissertation allow for the discovery and removal of different levels of bias within a social media dataset. This has important implications for social media mining, namely that the behavioral patterns and insights extracted from social media will be more representative of the populations under study.Dissertation/ThesisDoctoral Dissertation Computer Science 201
Stance characterization and detection on social media
Stance detection refers to the task of identifying a viewpoint as either supporting or
opposing a given topic. The current research on socio-political opinion mining on
social media is still in its infancy. Most computational approaches in this field are
limited to the independent use of textual elements of a user’s posts from social factors
such as homophily and network structure. This thesis provides a thorough study of
stance detection on social media and assesses various online signals to identify the
stance and understand its association with the analysed topic. We explore the task of
detecting stance on Twitter, which is a well-known social media platform where people
often express stance implicitly or explicitly.
First, we examine the relation between sentiment and stance and analyse the inter-play between sentiment polarity and expressed stance. For this purpose, we extend the
current SemEval stance dataset by annotating tweets related to four new topics with
sentiment and stance labels. Then, we evaluate the effectiveness of sentiment analysis
methods on stance prediction using two stance datasets.
Second, we examine the multi-modal representation of stance on social media by
evaluating multiple stance detection models using textual content and online interactions. The finding of this chapter suggests that using social interactions along with
other textual features can improve the stance detection model. Moreover, we show
how an unconscious social interaction can reveal the stance.
Next, we design an online framework to preserve users’ privacy concerning the
implicitly inferred stance on social media. Thus, we evaluate the effectiveness of the
two stance obfuscation methods and use different stance detection models to measure
the overall performance of the proposed framework.
Finally, we study the dynamics of polarized stance to understand the factors that
influence online stance. Particularly, we extend the analysis of online stance signals
and examine the interplay between stance and automated accounts (bots). Furthermore,
we pose the problem of gauging the bots’ effect on polarized stance through a sole
focus on the diffusion of bots on the online social network
Predictive Analysis on Twitter: Techniques and Applications
Predictive analysis of social media data has attracted considerable attention
from the research community as well as the business world because of the
essential and actionable information it can provide. Over the years, extensive
experimentation and analysis for insights have been carried out using Twitter
data in various domains such as healthcare, public health, politics, social
sciences, and demographics. In this chapter, we discuss techniques, approaches
and state-of-the-art applications of predictive analysis of Twitter data.
Specifically, we present fine-grained analysis involving aspects such as
sentiment, emotion, and the use of domain knowledge in the coarse-grained
analysis of Twitter data for making decisions and taking actions, and relate a
few success stories
Message Deletion on Telegram: Affected Data Types and Implications for Computational Analysis
Ephemeral digital trace data can decrease the completeness, reproducibility, and reliability of social media datasets. Systematic post deletions thus potentially bias the results of computational methods used to map actors, content, and online information diffusion. Therefore, the aim of this study was to assess the extent and distribution of message deletion across different data types using data from the hybrid messenger service Telegram, which has experienced an influx of deplatformed users from mainstream social media platforms. A repeatedly scraped sample of messages from public Telegram groups and channels was used to investigate the effect of message ephemerality on the consistency of Telegram datasets. The findings revealed that message deletion introduces biases to the computational collection and analysis of Telegram data. Further, message ephemerality reduces dataset consistency, the quality of social network analyses, and the results of computational content analysis methods, such as topic modeling or dictionaries. The implications of these findings for scholars aiming to use Telegram data for computational research, possible solutions, and contributions to the methodological advancement of studying online political communication are discussed further in this article
Trajectories of Blocked Community Members: Redemption, Recidivism and Departure
Community norm violations can impair constructive communication and
collaboration online. As a defense mechanism, community moderators often
address such transgressions by temporarily blocking the perpetrator. Such
actions, however, come with the cost of potentially alienating community
members. Given this tradeoff, it is essential to understand to what extent, and
in which situations, this common moderation practice is effective in
reinforcing community rules.
In this work, we introduce a computational framework for studying the future
behavior of blocked users on Wikipedia. After their block expires, they can
take several distinct paths: they can reform and adhere to the rules, but they
can also recidivate, or straight-out abandon the community. We reveal that
these trajectories are tied to factors rooted both in the characteristics of
the blocked individual and in whether they perceived the block to be fair and
justified. Based on these insights, we formulate a series of prediction tasks
aiming to determine which of these paths a user is likely to take after being
blocked for their first offense, and demonstrate the feasibility of these new
tasks. Overall, this work builds towards a more nuanced approach to moderation
by highlighting the tradeoffs that are in play.Comment: To appear in Proceedings of the 2019 World Wide Web Conference (WWW
'19), May 13-17, 2019, San Francisco, CA, USA. Code and data available as
part of ConvoKit: convokit.cornell.ed
- …