60 research outputs found
Hate Speech and Offensive Language Detection in Bengali
Social media often serves as a breeding ground for various hateful and
offensive content. Identifying such content on social media is crucial due to
its impact on the race, gender, or religion in an unprejudiced society.
However, while there is extensive research in hate speech detection in English,
there is a gap in hateful content detection in low-resource languages like
Bengali. Besides, a current trend on social media is the use of Romanized
Bengali for regular interactions. To overcome the existing research's
limitations, in this study, we develop an annotated dataset of 10K Bengali
posts consisting of 5K actual and 5K Romanized Bengali tweets. We implement
several baseline models for the classification of such hateful posts. We
further explore the interlingual transfer mechanism to boost classification
performance. Finally, we perform an in-depth error analysis by looking into the
misclassified posts by the models. While training actual and Romanized datasets
separately, we observe that XLM-Roberta performs the best. Further, we witness
that on joint training and few-shot training, MuRIL outperforms other models by
interpreting the semantic expressions better. We make our code and dataset
public for others.Comment: Accepted at AACL-IJCNLP 202
Multimodal Hate Speech Detection from Bengali Memes and Texts
Numerous works have been proposed to employ machine learning (ML) and deep
learning (DL) techniques to utilize textual data from social media for
anti-social behavior analysis such as cyberbullying, fake news propagation, and
hate speech mainly for highly resourced languages like English. However,
despite having a lot of diversity and millions of native speakers, some
languages such as Bengali are under-resourced, which is due to a lack of
computational resources for natural language processing (NLP). Like English,
Bengali social media content also includes images along with texts (e.g.,
multimodal contents are posted by embedding short texts into images on
Facebook), only the textual data is not enough to judge them (e.g., to
determine they are hate speech). In those cases, images might give extra
context to properly judge. This paper is about hate speech detection from
multimodal Bengali memes and texts. We prepared the only multimodal hate speech
detection dataset1 for a kind of problem for Bengali. We train several neural
architectures (i.e., neural networks like Bi-LSTM/Conv-LSTM with word
embeddings, EfficientNet + transformer architectures such as monolingual Bangla
BERT, multilingual BERT-cased/uncased, and XLM-RoBERTa) jointly analyze textual
and visual information for hate speech detection. The Conv-LSTM and XLM-RoBERTa
models performed best for texts, yielding F1 scores of 0.78 and 0.82,
respectively. As of memes, ResNet152 and DenseNet201 models yield F1 scores of
0.78 and 0.7, respectively. The multimodal fusion of mBERT-uncased +
EfficientNet-B1 performed the best, yielding an F1 score of 0.80. Our study
suggests that memes are moderately useful for hate speech detection in Bengali,
but none of the multimodal models outperform unimodal models analyzing only
textual data
BanglaAbuseMeme: A Dataset for Bengali Abusive Meme Classification
The dramatic increase in the use of social media platforms for information
sharing has also fueled a steep growth in online abuse. A simple yet effective
way of abusing individuals or communities is by creating memes, which often
integrate an image with a short piece of text layered on top of it. Such
harmful elements are in rampant use and are a threat to online safety. Hence it
is necessary to develop efficient models to detect and flag abusive memes. The
problem becomes more challenging in a low-resource setting (e.g., Bengali
memes, i.e., images with Bengali text embedded on it) because of the absence of
benchmark datasets on which AI models could be trained. In this paper we bridge
this gap by building a Bengali meme dataset. To setup an effective benchmark we
implement several baseline models for classifying abusive memes using this
dataset. We observe that multimodal models that use both textual and visual
information outperform unimodal models. Our best-performing model achieves a
macro F1 score of 70.51. Finally, we perform a qualitative error analysis of
the misclassified memes of the best-performing text-based, image-based and
multimodal models.Comment: EMNLP 2023 (main conference
Tackling Hate Speech in Low-resource Languages with Context Experts
Given Myanmars historical and socio-political context, hate speech spread on
social media has escalated into offline unrest and violence. This paper
presents findings from our remote study on the automatic detection of hate
speech online in Myanmar. We argue that effectively addressing this problem
will require community-based approaches that combine the knowledge of context
experts with machine learning tools that can analyze the vast amount of data
produced. To this end, we develop a systematic process to facilitate this
collaboration covering key aspects of data collection, annotation, and model
validation strategies. We highlight challenges in this area stemming from small
and imbalanced datasets, the need to balance non-glamorous data work and
stakeholder priorities, and closed data-sharing practices. Stemming from these
findings, we discuss avenues for further work in developing and deploying hate
speech detection systems for low-resource languages.Comment: ICTD 2022 Conference pape
An Empirical Study of Offensive Language in Online Interactions
In the past decade, usage of social media platforms has increased significantly. People use these platforms to connect with friends and family, share information, news and opinions. Platforms such as Facebook, Twitter are often used to propagate offensive and hateful content online. The open nature and anonymity of the internet fuels aggressive and inflamed conversations. The companies and federal institutions are striving to make social media cleaner, welcoming and unbiased. In this study, we first explore the underlying topics in popular offensive language datasets using statistical and neural topic modeling. The current state-of-the-art models for aggression detection only present a toxicity score based on the entire post. Content moderators often have to deal with lengthy texts without any word-level indicators. We propose a neural transformer approach for detecting the tokens that make a particular post aggressive. The pre-trained BERT model has achieved state-of-the-art results in various natural language processing tasks. However, the model is trained on general-purpose corpora and lacks aggressive social media linguistic features. We propose fBERT, a retrained BERT model with over million offensive tweets from the SOLID dataset. We demonstrate the effectiveness and portability of fBERT over BERT in various shared offensive language detection tasks. We further propose a new multi-task aggression detection (MAD) framework for post and token-level aggression detection using neural transformers. The experiments confirm the effectiveness of the multi-task learning model over individual models; particularly when the number of training data is limited
HATE CRIMES IN SOCIAL MEDIA: A CRIMINOLOGICAL REVIEW
Hate crime in social media is a common phenomenon around the world. Hate crime against different races, minorities, and ethnic people or groups is now spreading via social media platforms in form of hate speech. The anonymity of the internet user and the availability of the internet make this crime very easy to commit by the offender. This paper aims to find out the targets of hate crime in social media and its effect on the victims. The people who are victimized by hate crime in social media because of their race, gender especially female and religious minority. The study has done by secondary data analysis. Hate crime in social media has a devastating effect on the victim both physically and mainly psychologically which makes them mentally inferior, degradation of self-esteem, and also create a fear of violence in their mind. The existing laws against hate crime in social media should be implemented more precise way and different types of detection methods should be applied to detect the offenders. This paper can be helpful to increase awareness about hate crime in social media which is unnoticed by many researchers in our country and can lead a way to stop victimization in social media platforms
- …