15 research outputs found
Exploiting context for rumour detection in social media
Tools that are able to detect unverified information posted on social media during a news event can help to avoid the spread of rumours that turn out to be false. In this paper we compare a novel approach using Conditional Random Fields that learns from the sequential dynamics of social media posts with the current state-of-the-art rumour detection system, as well as other baselines. In contrast to existing work, our classifier does not need to observe tweets querying the stance of a post to deem it a rumour but, instead, exploits context learned during the event. Our classifier has improved precision and recall over the state-of-the-art classifier that relies on querying tweets, as well as outperforming our best baseline. Moreover, the results provide evidence for the generalisability of our classifier
QMUL-SDS at CheckThat! 2020: Determining COVID-19 Tweet Check-Worthiness Using an Enhanced CT-BERT with Numeric Expressions
This paper describes the participation of the QMUL-SDS team for Task 1 of the
CLEF 2020 CheckThat! shared task. The purpose of this task is to determine the
check-worthiness of tweets about COVID-19 to identify and prioritise tweets
that need fact-checking. The overarching aim is to further support ongoing
efforts to protect the public from fake news and help people find reliable
information. We describe and analyse the results of our submissions. We show
that a CNN using COVID-Twitter-BERT (CT-BERT) enhanced with numeric expressions
can effectively boost performance from baseline results. We also show results
of training data augmentation with rumours on other topics. Our best system
ranked fourth in the task with encouraging outcomes showing potential for
improved results in the future
Overview of the Shared Task on Fake News Detection in Urdu at FIRE 2021
Automatic detection of fake news is a highly important task in the
contemporary world. This study reports the 2nd shared task called
UrduFake@FIRE2021 on identifying fake news detection in Urdu. The goal of the
shared task is to motivate the community to come up with efficient methods for
solving this vital problem, particularly for the Urdu language. The task is
posed as a binary classification problem to label a given news article as a
real or a fake news article. The organizers provide a dataset comprising news
in five domains: (i) Health, (ii) Sports, (iii) Showbiz, (iv) Technology, and
(v) Business, split into training and testing sets. The training set contains
1300 annotated news articles -- 750 real news, 550 fake news, while the testing
set contains 300 news articles -- 200 real, 100 fake news. 34 teams from 7
different countries (China, Egypt, Israel, India, Mexico, Pakistan, and UAE)
registered to participate in the UrduFake@FIRE2021 shared task. Out of those,
18 teams submitted their experimental results, and 11 of those submitted their
technical reports, which is substantially higher compared to the UrduFake
shared task in 2020 when only 6 teams submitted their technical reports. The
technical reports submitted by the participants demonstrated different data
representation techniques ranging from count-based BoW features to word vector
embeddings as well as the use of numerous machine learning algorithms ranging
from traditional SVM to various neural network architectures including
Transformers such as BERT and RoBERTa. In this year's competition, the best
performing system obtained an F1-macro score of 0.679, which is lower than the
past year's best result of 0.907 F1-macro. Admittedly, while training sets from
the past and the current years overlap to a large extent, the testing set
provided this year is completely different
Transformer Based Multi-Source Domain Adaptation
In practical machine learning settings, the data on which a model must make
predictions often come from a different distribution than the data it was
trained on. Here, we investigate the problem of unsupervised multi-source
domain adaptation, where a model is trained on labelled data from multiple
source domains and must make predictions on a domain for which no labelled data
has been seen. Prior work with CNNs and RNNs has demonstrated the benefit of
mixture of experts, where the predictions of multiple domain expert classifiers
are combined; as well as domain adversarial training, to induce a domain
agnostic representation space. Inspired by this, we investigate how such
methods can be effectively applied to large pretrained transformer models. We
find that domain adversarial training has an effect on the learned
representations of these models while having little effect on their
performance, suggesting that large transformer-based models are already
relatively robust across domains. Additionally, we show that mixture of experts
leads to significant performance improvements by comparing several variants of
mixing functions, including one novel mixture based on attention. Finally, we
demonstrate that the predictions of large pretrained transformer based domain
experts are highly homogenous, making it challenging to learn effective
functions for mixing their predictions.Comment: 12 pages, 3 figures, 5 table
Exposing and explaining fake news on-the-fly
Social media platforms enable the rapid dissemination and consumption of information. However, users instantly consume such content regardless of the reliability of the shared data. Consequently, the latter crowdsourcing model is exposed to manipulation. This work contributes with an explainable and online classification method to recognize fake news in real-time. The proposed method combines both unsupervised and supervised Machine Learning approaches with online created lexica. The profiling is built using creator-, content- and context-based features using Natural Language Processing techniques. The explainable classification mechanism displays in a dashboard the features selected for classification and the prediction confidence. The performance of the proposed solution has been validated with real data sets from Twitter and the results attain 80% accuracy and macro F-measure. This proposal is the first to jointly provide data stream processing, profiling, classification and explainability. Ultimately, the proposed early detection, isolation and explanation of fake news contribute to increase the quality and trustworthiness of social media contentsXunta de Galicia | Ref. ED481B-2021-118Xunta de Galicia | Ref. ED481B-2022-093Fundação para a Ciência e a Tecnologia | Ref. UIDB/50014/2020Universidade de Vigo/CISU
A Model to Measure the Spread Power of Rumors
Nowadays, a significant portion of daily interacted posts in social media are
infected by rumors. This study investigates the problem of rumor analysis in
different areas from other researches. It tackles the unaddressed problem
related to calculating the Spread Power of Rumor (SPR) for the first time and
seeks to examine the spread power as the function of multi-contextual features.
For this purpose, the theory of Allport and Postman will be adopted. In which
it claims that there are two key factors determinant to the spread power of
rumors, namely importance and ambiguity. The proposed Rumor Spread Power
Measurement Model (RSPMM) computes SPR by utilizing a textual-based approach,
which entails contextual features to compute the spread power of the rumors in
two categories: False Rumor (FR) and True Rumor (TR). Totally 51 contextual
features are introduced to measure SPR and their impact on classification are
investigated, then 42 features in two categories "importance" (28 features) and
"ambiguity" (14 features) are selected to compute SPR. The proposed RSPMM is
verified on two labelled datasets, which are collected from Twitter and
Telegram. The results show that (i) the proposed new features are effective and
efficient to discriminate between FRs and TRs. (ii) the proposed RSPMM approach
focused only on contextual features while existing techniques are based on
Structure and Content features, but RSPMM achieves considerably outstanding
results (F-measure=83%). (iii) The result of T-Test shows that SPR criteria can
significantly distinguish between FR and TR, besides it can be useful as a new
method to verify the trueness of rumors