1 research outputs found
Tackling Fake News in Bengali: Unraveling the Impact of Summarization vs. Augmentation on Pre-trained Language Models
With the rise of social media and online news sources, fake news has become a
significant issue globally. However, the detection of fake news in low resource
languages like Bengali has received limited attention in research. In this
paper, we propose a methodology consisting of four distinct approaches to
classify fake news articles in Bengali using summarization and augmentation
techniques with five pre-trained language models. Our approach includes
translating English news articles and using augmentation techniques to curb the
deficit of fake news articles. Our research also focused on summarizing the
news to tackle the token length limitation of BERT based models. Through
extensive experimentation and rigorous evaluation, we show the effectiveness of
summarization and augmentation in the case of Bengali fake news detection. We
evaluated our models using three separate test datasets. The BanglaBERT Base
model, when combined with augmentation techniques, achieved an impressive
accuracy of 96% on the first test dataset. On the second test dataset, the
BanglaBERT model, trained with summarized augmented news articles achieved 97%
accuracy. Lastly, the mBERT Base model achieved an accuracy of 86% on the third
test dataset which was reserved for generalization performance evaluation. The
datasets and implementations are available at
https://github.com/arman-sakif/Bengali-Fake-News-DetectionComment: Under Revie