149 research outputs found
CPL-NoViD: Context-Aware Prompt-based Learning for Norm Violation Detection in Online Communities
Detecting norm violations in online communities is critical to maintaining
healthy and safe spaces for online discussions. Existing machine learning
approaches often struggle to adapt to the diverse rules and interpretations
across different communities due to the inherent challenges of fine-tuning
models for such context-specific tasks. In this paper, we introduce
Context-aware Prompt-based Learning for Norm Violation Detection (CPL-NoViD), a
novel method that employs prompt-based learning to detect norm violations
across various types of rules. CPL-NoViD outperforms the baseline by
incorporating context through natural language prompts and demonstrates
improved performance across different rule types. Significantly, it not only
excels in cross-rule-type and cross-community norm violation detection but also
exhibits adaptability in few-shot learning scenarios. Most notably, it
establishes a new state-of-the-art in norm violation detection, surpassing
existing benchmarks. Our work highlights the potential of prompt-based learning
for context-sensitive norm violation detection and paves the way for future
research on more adaptable, context-aware models to better support online
community moderators
Detection of Hate-Speech Tweets Based on Deep Learning: A Review
Cybercrime, cyberbullying, and hate speech have all increased in conjunction with the use of the internet and social media. The scope of hate speech knows no bounds or organizational or individual boundaries. This disorder affects many people in diverse ways. It can be harsh, offensive, or discriminating depending on the target's gender, race, political opinions, religious intolerance, nationality, human color, disability, ethnicity, sexual orientation, or status as an immigrant. Authorities and academics are investigating new methods for identifying hate speech on social media platforms like Facebook and Twitter. This study adds to the ongoing discussion about creating safer digital spaces while balancing limiting hate speech and protecting freedom of speech.  Partnerships between researchers, platform developers, and communities are crucial in creating efficient and ethical content moderation systems on Twitter and other social media sites. For this reason, multiple methodologies, models, and algorithms are employed. This study presents a thorough analysis of hate speech in numerous research publications. Each article has been thoroughly examined, including evaluating the algorithms or methodologies used, databases, classification techniques, and the findings achieved.  In addition, comprehensive discussions were held on all the examined papers, explicitly focusing on consuming deep learning techniques to detect hate speech
Machine Generated Text: A Comprehensive Survey of Threat Models and Detection Methods
Machine generated text is increasingly difficult to distinguish from human
authored text. Powerful open-source models are freely available, and
user-friendly tools that democratize access to generative models are
proliferating. ChatGPT, which was released shortly after the first preprint of
this survey, epitomizes these trends. The great potential of state-of-the-art
natural language generation (NLG) systems is tempered by the multitude of
avenues for abuse. Detection of machine generated text is a key countermeasure
for reducing abuse of NLG models, with significant technical challenges and
numerous open problems. We provide a survey that includes both 1) an extensive
analysis of threat models posed by contemporary NLG systems, and 2) the most
complete review of machine generated text detection methods to date. This
survey places machine generated text within its cybersecurity and social
context, and provides strong guidance for future work addressing the most
critical threat models, and ensuring detection systems themselves demonstrate
trustworthiness through fairness, robustness, and accountability.Comment: Manuscript submitted to ACM Special Session on Trustworthy AI.
2022/11/19 - Updated reference
âIt ainât all good:" Machinic abuse detection and marginalisation in machine learning
Online abusive language has been given increasing prominence as a societal problem over the past few years as people are increasingly communicating on online platforms.
This increase in prominence has resulted in an increase in academic attention to the issue, particularly within the field of Natural Language Processing (NLP), which has proposed multiple datasets and machine learning methods for the detection of text-based abuse.
Recently, the issue of disparate impacts of machine learning has been given attention, showing that marginalised groups in society are disproportionately negatively affected by automated content moderation systems.
Moreover, a number of challenges have been identified for abusive language detection technologies, including poor model performance across datasets and a lack of ability of models to contextualise potentially abusive speech within the context of speaker intentions. This dissertation aims to ask how NLP models for online abuse detection can address issues of generalisation and context.
Through critically examining the task of online abuse detection, I highlight how content moderation acts as protective filter that seeks to maintain a sanitised environment.
I find that when considering automated content moderation systems through this lens, it is made clear that such systems are centred around experiences of some bodies at the expense of others, often those who are already marginalised.
In efforts to address this, I propose two different modelling processes that a) centre the the mental and emotional states of the speaker by representing documents through the Linguistic Inquiry and Word Count (LIWC) categories that they invoke, and using Multi-Task Learning (MTL) to model abuse, such that the model takes aims to take account the intentions of the speaker.
I find that through the use of LIWC for representing documents, machine learning models for online abuse detection can see improvements in classification scores on in-domain and out-of-domain datasets.
Similarly, I show that through a use of MTL, machine learning models can gain improvements by using a variety of auxiliary tasks that combine data for content moderation systems and data for related tasks such as sarcasm detection.
Finally, I critique the machine learning pipeline in an effort to identify paths forward that can bring into focus the people who are excluded and are likely to experience harms from machine learning models for content moderation
Towards Automated Moderation: Enabling Toxic Language Detection with Transfer Learning and Attention-Based Models
Our world is more connected than ever before. Sadly, however, this highly connected world has made it easier to bully, insult, and propagate hate speech on the cyberspace. Even though researchers and companies alike have started investigating this real-world problem, the question remains as to why users are increasingly being exposed to hate and discrimination online. In fact, the noticeable and persistent increase in harmful language on social media platforms indicates that the situation is, actually, only getting worse. Hence, in this work, we show that contemporary ML methods can help tackle this challenge in an accurate and cost-effective manner. Our experiments demonstrate that a universal approach combining transfer learning methods and state-of-the-art Transformer architectures can trigger the efficient development of toxic language detection models. Consequently, with this universal approach, we provide platform providers with a simplistic approach capable of enabling the automated moderation of user-generated content, and as a result, hope to contribute to making the web a safer place
SoK: Content Moderation in Social Media, from Guidelines to Enforcement, and Research to Practice
To counter online abuse and misinformation, social media platforms have been
establishing content moderation guidelines and employing various moderation
policies. The goal of this paper is to study these community guidelines and
moderation practices, as well as the relevant research publications to identify
the research gaps, differences in moderation techniques, and challenges that
should be tackled by the social media platforms and the research community at
large. In this regard, we study and analyze in the US jurisdiction the fourteen
most popular social media content moderation guidelines and practices, and
consolidate them. We then introduce three taxonomies drawn from this analysis
as well as covering over one hundred interdisciplinary research papers about
moderation strategies. We identified the differences between the content
moderation employed in mainstream social media platforms compared to fringe
platforms. We also highlight the implications of Section 230, the need for
transparency and opacity in content moderation, why platforms should shift from
a one-size-fits-all model to a more inclusive model, and lastly, we highlight
why there is a need for a collaborative human-AI system
Cyberbullying detection: Hybrid models based on machine learning and natural language processing techniques
The rise in web and social media interactions has resulted in the efortless proliferation of offensive language and hate speech. Such online harassment, insults, and attacks are commonly termed cyberbullying. The sheer volume of userâgenerated content has made it challenging to iden-tify such illicit content. Machine learning has wide applications in text classification, and researchers are shifting towards using deep neural networks in detecting cyberbullying due to the several ad-vantages they have over traditional machine learning algorithms. This paper proposes a novel neural network framework with parameter optimization and an algorithmic comparative study of eleven classification methods: four traditional machine learning and seven shallow neural networks on two real world cyberbullying datasets. In addition, this paper also examines the effect of feature extraction and wordâembeddingâtechniquesâbased natural language processing on algorithmic per-formance. Key observations from this study show that bidirectional neural networks and attention models provide high classification results. Logistic Regression was observed to be the best among the traditional machine learning classifiers used. Term FrequencyâInverse Document Frequency (TFâIDF) demonstrates consistently high accuracies with traditional machine learning techniques. Global Vectors (GloVe) perform better with neural network models. BiâGRU and BiâLSTM worked best amongst the neural networks used. The extensive experiments performed on the two datasets establish the importance of this work by comparing eleven classification methods and seven feature extraction techniques. Our proposed shallow neural networks outperform existing stateâofâtheâart approaches for cyberbullying detection, with accuracy and F1âscores as high as ~95% and ~98%, respectively
- âŠ