2 research outputs found
Interpretable Multi Labeled Bengali Toxic Comments Classification using Deep Learning
This paper presents a deep learning-based pipeline for categorizing Bengali
toxic comments, in which at first a binary classification model is used to
determine whether a comment is toxic or not, and then a multi-label classifier
is employed to determine which toxicity type the comment belongs to. For this
purpose, we have prepared a manually labeled dataset consisting of 16,073
instances among which 8,488 are Toxic and any toxic comment may correspond to
one or more of the six toxic categories - vulgar, hate, religious, threat,
troll, and insult simultaneously. Long Short Term Memory (LSTM) with BERT
Embedding achieved 89.42% accuracy for the binary classification task while as
a multi-label classifier, a combination of Convolutional Neural Network and
Bi-directional Long Short Term Memory (CNN-BiLSTM) with attention mechanism
achieved 78.92% accuracy and 0.86 as weighted F1-score. To explain the
predictions and interpret the word feature importance during classification by
the proposed models, we utilized Local Interpretable Model-Agnostic
Explanations (LIME) framework. We have made our dataset public and can be
accessed at -
https://github.com/deepu099cse/Multi-Labeled-Bengali-Toxic-Comments-Classificatio
Tackling Fake News in Bengali: Unraveling the Impact of Summarization vs. Augmentation on Pre-trained Language Models
With the rise of social media and online news sources, fake news has become a
significant issue globally. However, the detection of fake news in low resource
languages like Bengali has received limited attention in research. In this
paper, we propose a methodology consisting of four distinct approaches to
classify fake news articles in Bengali using summarization and augmentation
techniques with five pre-trained language models. Our approach includes
translating English news articles and using augmentation techniques to curb the
deficit of fake news articles. Our research also focused on summarizing the
news to tackle the token length limitation of BERT based models. Through
extensive experimentation and rigorous evaluation, we show the effectiveness of
summarization and augmentation in the case of Bengali fake news detection. We
evaluated our models using three separate test datasets. The BanglaBERT Base
model, when combined with augmentation techniques, achieved an impressive
accuracy of 96% on the first test dataset. On the second test dataset, the
BanglaBERT model, trained with summarized augmented news articles achieved 97%
accuracy. Lastly, the mBERT Base model achieved an accuracy of 86% on the third
test dataset which was reserved for generalization performance evaluation. The
datasets and implementations are available at
https://github.com/arman-sakif/Bengali-Fake-News-DetectionComment: Under Revie