2 research outputs found
UrduFake@FIRE2021: Shared Track on Fake News Identification in Urdu
This study reports the second shared task named as UrduFake@FIRE2021 on
identifying fake news detection in Urdu language. This is a binary
classification problem in which the task is to classify a given news article
into two classes: (i) real news, or (ii) fake news. In this shared task, 34
teams from 7 different countries (China, Egypt, Israel, India, Mexico,
Pakistan, and UAE) registered to participate in the shared task, 18 teams
submitted their experimental results and 11 teams submitted their technical
reports. The proposed systems were based on various count-based features and
used different classifiers as well as neural network architectures. The
stochastic gradient descent (SGD) algorithm outperformed other classifiers and
achieved 0.679 F-score
Overview of the Shared Task on Fake News Detection in Urdu at FIRE 2021
Automatic detection of fake news is a highly important task in the
contemporary world. This study reports the 2nd shared task called
UrduFake@FIRE2021 on identifying fake news detection in Urdu. The goal of the
shared task is to motivate the community to come up with efficient methods for
solving this vital problem, particularly for the Urdu language. The task is
posed as a binary classification problem to label a given news article as a
real or a fake news article. The organizers provide a dataset comprising news
in five domains: (i) Health, (ii) Sports, (iii) Showbiz, (iv) Technology, and
(v) Business, split into training and testing sets. The training set contains
1300 annotated news articles -- 750 real news, 550 fake news, while the testing
set contains 300 news articles -- 200 real, 100 fake news. 34 teams from 7
different countries (China, Egypt, Israel, India, Mexico, Pakistan, and UAE)
registered to participate in the UrduFake@FIRE2021 shared task. Out of those,
18 teams submitted their experimental results, and 11 of those submitted their
technical reports, which is substantially higher compared to the UrduFake
shared task in 2020 when only 6 teams submitted their technical reports. The
technical reports submitted by the participants demonstrated different data
representation techniques ranging from count-based BoW features to word vector
embeddings as well as the use of numerous machine learning algorithms ranging
from traditional SVM to various neural network architectures including
Transformers such as BERT and RoBERTa. In this year's competition, the best
performing system obtained an F1-macro score of 0.679, which is lower than the
past year's best result of 0.907 F1-macro. Admittedly, while training sets from
the past and the current years overlap to a large extent, the testing set
provided this year is completely different