2,713 research outputs found
Measuring and characterizing hate speech on news websites
This paper was presented at the 12th ACM Conference on Web Science, 5-9 October 2020. This is the accepted manuscript version of the paper. The final version is available online from the Association for Computing Machinery at: https://doi.org/10.1145/3394231.3397902.The Web has become the main source for news acquisition. At the same time, news discussion has become more social: users can post comments on news articles or discuss news articles on other platforms like Reddit. These features empower and enable discussions among the users; however, they also act as the medium for the dissemination of toxic discourse and hate speech. The research community lacks a general understanding on what type of content attracts hateful discourse and the possible effects of social networks on the commenting activity on news articles.
In this work, we perform a large-scale quantitative analysis of 125M comments posted on 412K news articles over the course of 19 months. We analyze the content of the collected articles and their comments using temporal analysis, user-based analysis, and linguistic analysis, to shed light on what elements attract hateful comments on news articles. We also investigate commenting activity when an article is posted on either 4chan’s Politically Incorrect board (/pol/) or six selected subreddits. We find statistically significant increases in hateful commenting activity around real-world divisive events like the “Unite the Right” rally in Charlottesville and political events like the second and third 2016 US presidential debates. Also, we find that articles that attract a substantial number of hateful comments have different linguistic characteristics when compared to articles that do not attract hateful comments. Furthermore, we observe that the post of a news articles on either /pol/ or the six subreddits is correlated with an increase of (hateful) commenting activity on the news articles.Accepted manuscrip
Reading In-Between the Lines: An Analysis of Dissenter
Efforts by content creators and social networks to enforce legal and
policy-based norms, e.g. blocking hate speech and users, has driven the rise of
unrestricted communication platforms. One such recent effort is Dissenter, a
browser and web application that provides a conversational overlay for any web
page. These conversations hide in plain sight - users of Dissenter can see and
participate in this conversation, whereas visitors using other browsers are
oblivious to their existence. Further, the website and content owners have no
power over the conversation as it resides in an overlay outside their control.
In this work, we obtain a history of Dissenter comments, users, and the
websites being discussed, from the initial release of Dissenter in Feb. 2019
through Apr. 2020 (14 months). Our corpus consists of approximately 1.68M
comments made by 101k users commenting on 588k distinct URLs. We first analyze
macro characteristics of the network, including the user-base, comment
distribution, and growth. We then use toxicity dictionaries, Perspective API,
and a Natural Language Processing model to understand the nature of the
comments and measure the propensity of particular websites and content to
elicit hateful and offensive Dissenter comments. Using curated rankings of
media bias, we examine the conditional probability of hateful comments given
left and right-leaning content. Finally, we study Dissenter as a social
network, and identify a core group of users with high comment toxicity.Comment: Accepted at IMC 202
Characterizing and Detecting Hateful Users on Twitter
Most current approaches to characterize and detect hate speech focus on
\textit{content} posted in Online Social Networks. They face shortcomings to
collect and annotate hateful speech due to the incompleteness and noisiness of
OSN text and the subjectivity of hate speech. These limitations are often aided
with constraints that oversimplify the problem, such as considering only tweets
containing hate-related words. In this work we partially address these issues
by shifting the focus towards \textit{users}. We develop and employ a robust
methodology to collect and annotate hateful users which does not depend
directly on lexicon and where the users are annotated given their entire
profile. This results in a sample of Twitter's retweet graph containing
users, out of which were annotated. We also collect the users
who were banned in the three months that followed the data collection. We show
that hateful users differ from normal ones in terms of their activity patterns,
word usage and as well as network structure. We obtain similar results
comparing the neighbors of hateful vs. neighbors of normal users and also
suspended users vs. active users, increasing the robustness of our analysis. We
observe that hateful users are densely connected, and thus formulate the hate
speech detection problem as a task of semi-supervised learning over a graph,
exploiting the network of connections on Twitter. We find that a node embedding
algorithm, which exploits the graph structure, outperforms content-based
approaches for the detection of both hateful ( AUC vs AUC) and
suspended users ( AUC vs AUC). Altogether, we present a
user-centric view of hate speech, paving the way for better detection and
understanding of this relevant and challenging issue.Comment: This is an extended version of the homonymous short paper to be
presented at ICWSM-18. arXiv admin note: text overlap with arXiv:1801.0031
Violence Detection in Social Media-Review
Social media has become a vital part of humans’ day to day life. Different users engage with social media differently. With the increased usage of social media, many researchers have investigated different aspects of social media. Many examples in the recent past show, content in the social media can generate violence in the user community. Violence in social media can be categorised into aggregation in comments, cyber-bullying and incidents like protests, murders. Identifying violent content in social media is a challenging task: social media posts contain both the visual and text as well as these posts may contain hidden meaning according to the users’ context and other background information. This paper summarizes the different social media violent categories and existing methods to detect the violent content.Keywords: Machine learning, natural language processing, violence, social media, convolution neural networ
Impact of Stricter Content Moderation on Parler's Users' Discourse
Social media platforms employ various content moderation techniques to remove
harmful, offensive, and hate speech content. The moderation level varies across
platforms; even over time, it can evolve in a platform. For example, Parler, a
fringe social media platform popular among conservative users, was known to
have the least restrictive moderation policies, claiming to have open
discussion spaces for their users. However, after linking the 2021 US Capitol
Riots and the activity of some groups on Parler, such as QAnon and Proud Boys,
on January 12, 2021, Parler was removed from the Apple and Google App Store and
suspended from Amazon Cloud hosting service. Parler would have to modify their
moderation policies to return to these online stores. After a month of
downtime, Parler was back online with a new set of user guidelines, which
reflected stricter content moderation, especially regarding the \emph{hate
speech} policy.
In this paper, we studied the moderation changes performed by Parler and
their effect on the toxicity of its content. We collected a large longitudinal
Parler dataset with 17M parleys from 432K active users from February 2021 to
January 2022, after its return to the Internet and App Store. To the best of
our knowledge, this is the first study investigating the effectiveness of
content moderation techniques using data-driven approaches and also the first
Parler dataset after its brief hiatus. Our quasi-experimental time series
analysis indicates that after the change in Parler's moderation, the severe
forms of toxicity (above a threshold of 0.5) immediately decreased and
sustained. In contrast, the trend did not change for less severe threats and
insults (a threshold between 0.5 - 0.7). Finally, we found an increase in the
factuality of the news sites being shared, as well as a decrease in the number
of conspiracy or pseudoscience sources being shared
Towards Measuring Adversarial Twitter Interactions against Candidates in the US Midterm Elections
Adversarial interactions against politicians on social media such as Twitter
have significant impact on society. In particular they disrupt substantive
political discussions online, and may discourage people from seeking public
office. In this study, we measure the adversarial interactions against
candidates for the US House of Representatives during the run-up to the 2018 US
general election. We gather a new dataset consisting of 1.7 million tweets
involving candidates, one of the largest corpora focusing on political
discourse. We then develop a new technique for detecting tweets with toxic
content that are directed at any specific candidate.Such technique allows us to
more accurately quantify adversarial interactions towards political candidates.
Further, we introduce an algorithm to induce candidate-specific adversarial
terms to capture more nuanced adversarial interactions that previous techniques
may not consider toxic. Finally, we use these techniques to outline the breadth
of adversarial interactions seen in the election, including offensive
name-calling, threats of violence, posting discrediting information, attacks on
identity, and adversarial message repetition
- …