Search CORE

60 research outputs found

Hate Speech and Offensive Language Detection in Bengali

Author: Banerjee Somnath
Das Mithun
Mukherjee Animesh
Saha Punyajoy
Publication venue
Publication date: 07/10/2022
Field of study

Social media often serves as a breeding ground for various hateful and offensive content. Identifying such content on social media is crucial due to its impact on the race, gender, or religion in an unprejudiced society. However, while there is extensive research in hate speech detection in English, there is a gap in hateful content detection in low-resource languages like Bengali. Besides, a current trend on social media is the use of Romanized Bengali for regular interactions. To overcome the existing research's limitations, in this study, we develop an annotated dataset of 10K Bengali posts consisting of 5K actual and 5K Romanized Bengali tweets. We implement several baseline models for the classification of such hateful posts. We further explore the interlingual transfer mechanism to boost classification performance. Finally, we perform an in-depth error analysis by looking into the misclassified posts by the models. While training actual and Romanized datasets separately, we observe that XLM-Roberta performs the best. Further, we witness that on joint training and few-shot training, MuRIL outperforms other models by interpreting the semantic expressions better. We make our code and dataset public for others.Comment: Accepted at AACL-IJCNLP 202

arXiv.org e-Print Archive

Multimodal Hate Speech Detection from Bengali Memes and Texts

Author: Chakravarthi Bharathi Raja
Dey Sumon Kanti
Islam Tanhim
Karim Md. Rezaul
Publication venue
Publication date: 19/04/2022
Field of study

Numerous works have been proposed to employ machine learning (ML) and deep learning (DL) techniques to utilize textual data from social media for anti-social behavior analysis such as cyberbullying, fake news propagation, and hate speech mainly for highly resourced languages like English. However, despite having a lot of diversity and millions of native speakers, some languages such as Bengali are under-resourced, which is due to a lack of computational resources for natural language processing (NLP). Like English, Bengali social media content also includes images along with texts (e.g., multimodal contents are posted by embedding short texts into images on Facebook), only the textual data is not enough to judge them (e.g., to determine they are hate speech). In those cases, images might give extra context to properly judge. This paper is about hate speech detection from multimodal Bengali memes and texts. We prepared the only multimodal hate speech detection dataset1 for a kind of problem for Bengali. We train several neural architectures (i.e., neural networks like Bi-LSTM/Conv-LSTM with word embeddings, EfficientNet + transformer architectures such as monolingual Bangla BERT, multilingual BERT-cased/uncased, and XLM-RoBERTa) jointly analyze textual and visual information for hate speech detection. The Conv-LSTM and XLM-RoBERTa models performed best for texts, yielding F1 scores of 0.78 and 0.82, respectively. As of memes, ResNet152 and DenseNet201 models yield F1 scores of 0.78 and 0.7, respectively. The multimodal fusion of mBERT-uncased + EfficientNet-B1 performed the best, yielding an F1 score of 0.80. Our study suggests that memes are moderately useful for hate speech detection in Bengali, but none of the multimodal models outperform unimodal models analyzing only textual data

arXiv.org e-Print Archive

BanglaAbuseMeme: A Dataset for Bengali Abusive Meme Classification

Author: Das Mithun
Mukherjee Animesh
Publication venue
Publication date: 18/10/2023
Field of study

The dramatic increase in the use of social media platforms for information sharing has also fueled a steep growth in online abuse. A simple yet effective way of abusing individuals or communities is by creating memes, which often integrate an image with a short piece of text layered on top of it. Such harmful elements are in rampant use and are a threat to online safety. Hence it is necessary to develop efficient models to detect and flag abusive memes. The problem becomes more challenging in a low-resource setting (e.g., Bengali memes, i.e., images with Bengali text embedded on it) because of the absence of benchmark datasets on which AI models could be trained. In this paper we bridge this gap by building a Bengali meme dataset. To setup an effective benchmark we implement several baseline models for classifying abusive memes using this dataset. We observe that multimodal models that use both textual and visual information outperform unimodal models. Our best-performing model achieves a macro F1 score of 70.51. Finally, we perform a qualitative error analysis of the misclassified memes of the best-performing text-based, image-based and multimodal models.Comment: EMNLP 2023 (main conference

arXiv.org e-Print Archive

Tackling Hate Speech in Low-resource Languages with Context Experts

Author: Best Michael L.
Essa Irfan
Nkemelu Daniel
Shah Harshil
Publication venue
Publication date: 29/03/2023
Field of study

Given Myanmars historical and socio-political context, hate speech spread on social media has escalated into offline unrest and violence. This paper presents findings from our remote study on the automatic detection of hate speech online in Myanmar. We argue that effectively addressing this problem will require community-based approaches that combine the knowledge of context experts with machine learning tools that can analyze the vast amount of data produced. To this end, we develop a systematic process to facilitate this collaboration covering key aspects of data collection, annotation, and model validation strategies. We highlight challenges in this area stemming from small and imbalanced datasets, the need to balance non-glamorous data work and stakeholder priorities, and closed data-sharing practices. Stemming from these findings, we discuss avenues for further work in developing and deploying hate speech detection systems for low-resource languages.Comment: ICTD 2022 Conference pape

arXiv.org e-Print Archive

An Empirical Study of Offensive Language in Online Interactions

Author: Sarkar Diptanu
Publication venue: RIT Scholar Works
Publication date: 01/05/2021
Field of study

In the past decade, usage of social media platforms has increased significantly. People use these platforms to connect with friends and family, share information, news and opinions. Platforms such as Facebook, Twitter are often used to propagate offensive and hateful content online. The open nature and anonymity of the internet fuels aggressive and inflamed conversations. The companies and federal institutions are striving to make social media cleaner, welcoming and unbiased. In this study, we first explore the underlying topics in popular offensive language datasets using statistical and neural topic modeling. The current state-of-the-art models for aggression detection only present a toxicity score based on the entire post. Content moderators often have to deal with lengthy texts without any word-level indicators. We propose a neural transformer approach for detecting the tokens that make a particular post aggressive. The pre-trained BERT model has achieved state-of-the-art results in various natural language processing tasks. However, the model is trained on general-purpose corpora and lacks aggressive social media linguistic features. We propose fBERT, a retrained BERT model with over

1.4

million offensive tweets from the SOLID dataset. We demonstrate the effectiveness and portability of fBERT over BERT in various shared offensive language detection tasks. We further propose a new multi-task aggression detection (MAD) framework for post and token-level aggression detection using neural transformers. The experiments confirm the effectiveness of the multi-task learning model over individual models; particularly when the number of training data is limited

RIT Scholar Works

HATE CRIMES IN SOCIAL MEDIA: A CRIMINOLOGICAL REVIEW

Author: Arnab Naha
Sumona Sharmin
Yesmen Nelufer
Publication venue: Bajang Institute
Publication date: 02/06/2023
Field of study

Hate crime in social media is a common phenomenon around the world. Hate crime against different races, minorities, and ethnic people or groups is now spreading via social media platforms in form of hate speech. The anonymity of the internet user and the availability of the internet make this crime very easy to commit by the offender. This paper aims to find out the targets of hate crime in social media and its effect on the victims. The people who are victimized by hate crime in social media because of their race, gender especially female and religious minority. The study has done by secondary data analysis. Hate crime in social media has a devastating effect on the victim both physically and mainly psychologically which makes them mentally inferior, degradation of self-esteem, and also create a fear of violence in their mind. The existing laws against hate crime in social media should be implemented more precise way and different types of detection methods should be applied to detect the offenders. This paper can be helpful to increase awareness about hate crime in social media which is unnoticed by many researchers in our country and can lead a way to stop victimization in social media platforms

OJS Bajang Institute

Resources and benchmark corpora for hate speech detection: a systematic review

Author: Basile Valerio
Bosco Cristina
Patti Viviana
Poletto Fabio
Sanguinetti Manuela
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2021
Field of study

Institutional Research Information System University of Turin