230 research outputs found
Incivility on Popular Politics and News Subreddits: An Analysis of In-groups, Community Guidelines and Relationships with Social Media Engagement
Political and news subreddits are individualistic as it pertains to the incivility we might expect them to exhibit; some have clear in-group members, and all have varying degrees of content moderation policies. We sample submissions (n = 127,870) and comments (n = 2,576,049) from 20 of the most popular news and politics subreddits from June 4th, 2021, to June 4th, 2022. All subreddits appear to be mostly civil, with incivility most commonly occurring in comments. When incivility occurs, it tends to take on less-severe forms including insults, profanity, and general toxicity. Subreddits with with clear political in-groups did exhibit more insults, toxicity, profanity, and identity-based attacks. The more complex a subreddit’s moderation policies, the less incivility was observed. Finally, uncivil submissions do result in a mild increase in engagement, but given the overall low prevalence of incivility observed, it appears not to be integral to a subreddit’s overall engagement
A Look Through a Broken Window: The Relationship Between Disorder and Toxicity on Social Networking Sites
Toxicity has increased on social networking sites (SNSs), sparking a debate on its underlying causes. While research readily explored eligible social factors, disorder induced by the very nature of SNSs has been neglected so far. The relationship between disorder and deviant behaviors could be revealed within the offline sphere. Incorporating the theoretical lens of the Broken Windows Theory, we propose that a similar mechanism is prevalent in the online context. To test the hypothesis that perceived disorder increases toxicity on SNSs, the study compares two subcommunities on Reddit dedicated to the same topic that differ in their perceived disorder. Sampling the toxicity scores via data collection and natural language processing yields the first evidence for our hypothesis. We further outline subsequent studies that aim to investigate further the phenomenon of how disorder-related factors contribute to toxic online environments
Watch Your Language: Large Language Models and Content Moderation
Large language models (LLMs) have exploded in popularity due to their ability
to perform a wide array of natural language tasks. Text-based content
moderation is one LLM use case that has received recent enthusiasm, however,
there is little research investigating how LLMs perform in content moderation
settings. In this work, we evaluate a suite of modern, commercial LLMs (GPT-3,
GPT-3.5, GPT-4) on two common content moderation tasks: rule-based community
moderation and toxic content detection. For rule-based community moderation, we
construct 95 LLM moderation-engines prompted with rules from 95 Reddit
subcommunities and find that LLMs can be effective at rule-based moderation for
many communities, achieving a median accuracy of 64% and a median precision of
83%. For toxicity detection, we find that LLMs significantly outperform
existing commercially available toxicity classifiers. However, we also find
that recent increases in model size add only marginal benefit to toxicity
detection, suggesting a potential performance plateau for LLMs on toxicity
detection tasks. We conclude by outlining avenues for future work in studying
LLMs and content moderation
PROVOKE : Toxicity trigger detection in conversations from the top 100 subreddits
Promoting healthy discourse on community-based online platforms like Reddit can be challenging, especially when conversations show ominous signs of toxicity. Therefore, in this study, we find the turning points (i.e., toxicity triggers) making conversations toxic. Before finding toxicity triggers, we built and evaluated various machine learning models to detect toxicity from Reddit comments.
Subsequently, we used our best-performing model, a fine-tuned Bidirectional Encoder Representations from Transformers (BERT) model that achieved an area under the receiver operating characteristic curve (AUC) score of 0.983 to detect toxicity. Next, we constructed conversation threads and used the toxicity prediction results to build a training set for detecting toxicity triggers. This procedure entailed using our large-scale dataset to refine toxicity triggers' definition and build a trigger detection dataset using 991,806 conversation threads from the top 100 communities on Reddit. Then, we extracted a set of sentiment shift, topical shift, and context-based features from the trigger detection dataset, using them to build a dual embedding biLSTM neural network that achieved an AUC score of 0.789. Our trigger detection dataset analysis showed that specific triggering keywords are common across all communities, like ‘racist’ and ‘women’. In contrast, other triggering keywords are specific to certain communities, like ‘overwatch’ in r/Games. Implications are that toxicity trigger detection algorithms can leverage generic approaches but must also tailor detections to specific communities.© 2022 Wuhan University. Published by Elsevier Ltd. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/)fi=vertaisarvioitu|en=peerReviewed
Comparing Toxicity Across Social Media Platforms for COVID-19 Discourse
The emergence of toxic information on social networking sites, such as
Twitter, Parler, and Reddit, has become a growing concern. Consequently, this
study aims to assess the level of toxicity in COVID-19 discussions on Twitter,
Parler, and Reddit. Using data analysis from January 1 through December 31,
2020, we examine the development of toxicity over time and compare the findings
across the three platforms. The results indicate that Parler had lower toxicity
levels than both Twitter and Reddit in discussions related to COVID-19. In
contrast, Reddit showed the highest levels of toxicity, largely due to various
anti-vaccine forums that spread misinformation about COVID-19 vaccines.
Notably, our analysis of COVID-19 vaccination conversations on Twitter also
revealed a significant presence of conspiracy theories among individuals with
highly toxic attitudes. Our computational approach provides decision-makers
with useful information about reducing the spread of toxicity within online
communities. The study's findings highlight the importance of taking action to
encourage more uplifting and productive online discourse across all platforms
SoK: Content Moderation in Social Media, from Guidelines to Enforcement, and Research to Practice
To counter online abuse and misinformation, social media platforms have been
establishing content moderation guidelines and employing various moderation
policies. The goal of this paper is to study these community guidelines and
moderation practices, as well as the relevant research publications to identify
the research gaps, differences in moderation techniques, and challenges that
should be tackled by the social media platforms and the research community at
large. In this regard, we study and analyze in the US jurisdiction the fourteen
most popular social media content moderation guidelines and practices, and
consolidate them. We then introduce three taxonomies drawn from this analysis
as well as covering over one hundred interdisciplinary research papers about
moderation strategies. We identified the differences between the content
moderation employed in mainstream social media platforms compared to fringe
platforms. We also highlight the implications of Section 230, the need for
transparency and opacity in content moderation, why platforms should shift from
a one-size-fits-all model to a more inclusive model, and lastly, we highlight
why there is a need for a collaborative human-AI system
Like trainer, like bot? Inheritance of bias in algorithmic content moderation
The internet has become a central medium through which `networked publics'
express their opinions and engage in debate. Offensive comments and personal
attacks can inhibit participation in these spaces. Automated content moderation
aims to overcome this problem using machine learning classifiers trained on
large corpora of texts manually annotated for offence. While such systems could
help encourage more civil debate, they must navigate inherently normatively
contestable boundaries, and are subject to the idiosyncratic norms of the human
raters who provide the training data. An important objective for platforms
implementing such measures might be to ensure that they are not unduly biased
towards or against particular norms of offence. This paper provides some
exploratory methods by which the normative biases of algorithmic content
moderation systems can be measured, by way of a case study using an existing
dataset of comments labelled for offence. We train classifiers on comments
labelled by different demographic subsets (men and women) to understand how
differences in conceptions of offence between these groups might affect the
performance of the resulting models on various test sets. We conclude by
discussing some of the ethical choices facing the implementers of algorithmic
moderation systems, given various desired levels of diversity of viewpoints
amongst discussion participants.Comment: 12 pages, 3 figures, 9th International Conference on Social
Informatics (SocInfo 2017), Oxford, UK, 13--15 September 2017 (forthcoming in
Springer Lecture Notes in Computer Science
Prominence Reduction versus Banning: An Empirical Investigation of Content Moderation Strategies in Online Platforms
Online platforms have adopted various types of content moderation strategies to combat antisocial behaviors such as verbal aggression. This study focuses on two types of strategies: group prominence reduction and banning. This study aims to provide a holistic picture of all downstream effects of these strategies. Additionally, we assess the differential effects of content moderation on multihoming versus non-multihoming users. Preliminary findings indicate that prominence reduction strategies applied to a problematic group have the adverse effect of increasing verbal aggression in outside spaces. Banning strategies differentially impact multihoming versus non-multihoming users. These findings have important implications, as they show that group prominence reduction strategies produce negative spillover effects, and the behavior of multihoming users on multiple external platforms, and whether our results generalize across multiple contexts
CPL-NoViD: Context-Aware Prompt-based Learning for Norm Violation Detection in Online Communities
Detecting norm violations in online communities is critical to maintaining
healthy and safe spaces for online discussions. Existing machine learning
approaches often struggle to adapt to the diverse rules and interpretations
across different communities due to the inherent challenges of fine-tuning
models for such context-specific tasks. In this paper, we introduce
Context-aware Prompt-based Learning for Norm Violation Detection (CPL-NoViD), a
novel method that employs prompt-based learning to detect norm violations
across various types of rules. CPL-NoViD outperforms the baseline by
incorporating context through natural language prompts and demonstrates
improved performance across different rule types. Significantly, it not only
excels in cross-rule-type and cross-community norm violation detection but also
exhibits adaptability in few-shot learning scenarios. Most notably, it
establishes a new state-of-the-art in norm violation detection, surpassing
existing benchmarks. Our work highlights the potential of prompt-based learning
for context-sensitive norm violation detection and paves the way for future
research on more adaptable, context-aware models to better support online
community moderators
- …