348 research outputs found
HateBERT:Retraining BERT for Abusive Language Detection in English
In this paper, we introduce HateBERT, a re-trained BERT model for abusive
language detection in English. The model was trained on RAL-E, a large-scale
dataset of Reddit comments in English from communities banned for being
offensive, abusive, or hateful that we have collected and made available to the
public. We present the results of a detailed comparison between a general
pre-trained language model and the abuse-inclined version obtained by
retraining with posts from the banned communities on three English datasets for
offensive, abusive language and hate speech detection tasks. In all datasets,
HateBERT outperforms the corresponding general BERT model. We also discuss a
battery of experiments comparing the portability of the generic pre-trained
language model and its corresponding abusive language-inclined counterpart
across the datasets, indicating that portability is affected by compatibility
of the annotated phenomena
HateBERT:Retraining BERT for Abusive Language Detection in English
In this paper, we introduce HateBERT, a re-trained BERT model for abusive
language detection in English. The model was trained on RAL-E, a large-scale
dataset of Reddit comments in English from communities banned for being
offensive, abusive, or hateful that we have collected and made available to the
public. We present the results of a detailed comparison between a general
pre-trained language model and the abuse-inclined version obtained by
retraining with posts from the banned communities on three English datasets for
offensive, abusive language and hate speech detection tasks. In all datasets,
HateBERT outperforms the corresponding general BERT model. We also discuss a
battery of experiments comparing the portability of the generic pre-trained
language model and its corresponding abusive language-inclined counterpart
across the datasets, indicating that portability is affected by compatibility
of the annotated phenomena
DoDo Learning: DOmain-DemOgraphic Transfer in Language Models for Detecting Abuse Targeted at Public Figures
Public figures receive a disproportionate amount of abuse on social media,
impacting their active participation in public life. Automated systems can
identify abuse at scale but labelling training data is expensive, complex and
potentially harmful. So, it is desirable that systems are efficient and
generalisable, handling both shared and specific aspects of online abuse. We
explore the dynamics of cross-group text classification in order to understand
how well classifiers trained on one domain or demographic can transfer to
others, with a view to building more generalisable abuse classifiers. We
fine-tune language models to classify tweets targeted at public figures across
DOmains (sport and politics) and DemOgraphics (women and men) using our novel
DODO dataset, containing 28,000 labelled entries, split equally across four
domain-demographic pairs. We find that (i) small amounts of diverse data are
hugely beneficial to generalisation and model adaptation; (ii) models transfer
more easily across demographics but models trained on cross-domain data are
more generalisable; (iii) some groups contribute more to generalisability than
others; and (iv) dataset similarity is a signal of transferability.Comment: 15 pages, 7 figures, 4 table
Studying Socially Unacceptable Discourse Classification (SUD) through different eyes: "Are we on the same page ?"
We study Socially Unacceptable Discourse (SUD) characterization and detection
in online text. We first build and present a novel corpus that contains a large
variety of manually annotated texts from different online sources used so far
in state-of-the-art Machine learning (ML) SUD detection solutions. This global
context allows us to test the generalization ability of SUD classifiers that
acquire knowledge around the same SUD categories, but from different contexts.
From this perspective, we can analyze how (possibly) different annotation
modalities influence SUD learning by discussing open challenges and open
research directions. We also provide several data insights which can support
domain experts in the annotation task
LLMs and Finetuning: Benchmarking cross-domain performance for hate speech detection
This paper compares different pre-trained and fine-tuned large language
models (LLMs) for hate speech detection. Our research underscores challenges in
LLMs' cross-domain validity and overfitting risks. Through evaluations, we
highlight the need for fine-tuned models that grasp the nuances of hate speech
through greater label heterogeneity. We conclude with a vision for the future
of hate speech detection, emphasizing cross-domain generalizability and
appropriate benchmarking practices.Comment: 9 pages, 3 figures, 4 table
GruPaTo at SemEval-2020 Task 12:Retraining mBERT on Social Media and Fine-tuned Offensive Language Models
We introduce an approach to multilingual Offensive Language Detection based on the mBERT transformer model. We download extra training data from Twitter in English, Danish, and Turkish, and use it to re-train the model. We then fine-tuned the model on the provided training data and, in some configurations, implement transfer learning approach exploiting the typological relatedness between English and Danish. Our systems obtained good results across the three languages (.9036 for EN, .7619 for DA, and .7789 for TR)
- …