CReHate: Cross-cultural Re-annotation of English Hate Speech Dataset

Jin, Jiho; Jung, Chani; Kim, Juho; Lee, Nayeon; Myung, Junho; Oh, Alice

CReHate: Cross-cultural Re-annotation of English Hate Speech Dataset

Authors: Jiho Jin
Chani Jung
Juho Kim
Nayeon Lee
Junho Myung
Alice Oh
Publication date: 31 August 2023
Publisher

Abstract

English datasets predominantly reflect the perspectives of certain nationalities, which can lead to cultural biases in models and datasets. This is particularly problematic in tasks heavily influenced by subjectivity, such as hate speech detection. To delve into how individuals from different countries perceive hate speech, we introduce CReHate, a cross-cultural re-annotation of the sampled SBIC dataset. This dataset includes annotations from five distinct countries: Australia, Singapore, South Africa, the United Kingdom, and the United States. Our thorough statistical analysis highlights significant differences based on nationality, with only 59.4% of the samples achieving consensus among all countries. We also introduce a culturally sensitive hate speech classifier via transfer learning, adept at capturing perspectives of different nationalities. These findings underscore the need to re-evaluate certain aspects of NLP research, especially with regard to the nuanced nature of hate speech in the English language

Similar works

Full text

Available Versions

arXiv.org e-Print Archive

oai:arXiv.org:2308.16705

Last time updated on 10/09/2023