817 research outputs found
Detecting Misinformation with LLM-Predicted Credibility Signals and Weak Supervision
Credibility signals represent a wide range of heuristics that are typically
used by journalists and fact-checkers to assess the veracity of online content.
Automating the task of credibility signal extraction, however, is very
challenging as it requires high-accuracy signal-specific extractors to be
trained, while there are currently no sufficiently large datasets annotated
with all credibility signals. This paper investigates whether large language
models (LLMs) can be prompted effectively with a set of 18 credibility signals
to produce weak labels for each signal. We then aggregate these potentially
noisy labels using weak supervision in order to predict content veracity. We
demonstrate that our approach, which combines zero-shot LLM credibility signal
labeling and weak supervision, outperforms state-of-the-art classifiers on two
misinformation datasets without using any ground-truth labels for training. We
also analyse the contribution of the individual credibility signals towards
predicting content veracity, which provides new valuable insights into their
role in misinformation detection
Detecting Harmful Agendas in News Articles
Manipulated news online is a growing problem which necessitates the use of
automated systems to curtail its spread. We argue that while misinformation and
disinformation detection have been studied, there has been a lack of investment
in the important open challenge of detecting harmful agendas in news articles;
identifying harmful agendas is critical to flag news campaigns with the
greatest potential for real world harm. Moreover, due to real concerns around
censorship, harmful agenda detectors must be interpretable to be effective. In
this work, we propose this new task and release a dataset, NewsAgendas, of
annotated news articles for agenda identification. We show how interpretable
systems can be effective on this task and demonstrate that they can perform
comparably to black-box models.Comment: Camera-ready for ACL-WASSA 202
Stance detection on social media: State of the art and trends
Stance detection on social media is an emerging opinion mining paradigm for
various social and political applications in which sentiment analysis may be
sub-optimal. There has been a growing research interest for developing
effective methods for stance detection methods varying among multiple
communities including natural language processing, web science, and social
computing. This paper surveys the work on stance detection within those
communities and situates its usage within current opinion mining techniques in
social media. It presents an exhaustive review of stance detection techniques
on social media, including the task definition, different types of targets in
stance detection, features set used, and various machine learning approaches
applied. The survey reports state-of-the-art results on the existing benchmark
datasets on stance detection, and discusses the most effective approaches. In
addition, this study explores the emerging trends and different applications of
stance detection on social media. The study concludes by discussing the gaps in
the current existing research and highlights the possible future directions for
stance detection on social media.Comment: We request withdrawal of this article sincerely. We will re-edit this
paper. Please withdraw this article before we finish the new versio
Context-Aware Message-Level Rumour Detection with Weak Supervision
Social media has become the main source of all sorts of information beyond a communication medium. Its intrinsic nature can allow a continuous and massive flow of misinformation to make a severe impact worldwide. In particular, rumours emerge unexpectedly and spread quickly. It is challenging to track down their origins and stop their propagation. One of the most ideal solutions to this is to identify rumour-mongering messages as early as possible, which is commonly referred to as "Early Rumour Detection (ERD)". This dissertation focuses on researching ERD on social media by exploiting weak supervision and contextual information. Weak supervision is a branch of ML where noisy and less precise sources (e.g. data patterns) are leveraged to learn limited high-quality labelled data (Ratner et al., 2017). This is intended to reduce the cost and increase the efficiency of the hand-labelling of large-scale data. This thesis aims to study whether identifying rumours before they go viral is possible and develop an architecture for ERD at individual post level. To this end, it first explores major bottlenecks of current ERD. It also uncovers a research gap between system design and its applications in the real world, which have received less attention from the research community of ERD. One bottleneck is limited labelled data. Weakly supervised methods to augment limited labelled training data for ERD are introduced. The other bottleneck is enormous amounts of noisy data. A framework unifying burst detection based on temporal signals and burst summarisation is investigated to identify potential rumours (i.e. input to rumour detection models) by filtering out uninformative messages. Finally, a novel method which jointly learns rumour sources and their contexts (i.e. conversational threads) for ERD is proposed. An extensive evaluation setting for ERD systems is also introduced
A Full-Image Full-Resolution End-to-End-Trainable CNN Framework for Image Forgery Detection
Due to limited computational and memory resources, current deep learning
models accept only rather small images in input, calling for preliminary image
resizing. This is not a problem for high-level vision problems, where
discriminative features are barely affected by resizing. On the contrary, in
image forensics, resizing tends to destroy precious high-frequency details,
impacting heavily on performance. One can avoid resizing by means of patch-wise
processing, at the cost of renouncing whole-image analysis. In this work, we
propose a CNN-based image forgery detection framework which makes decisions
based on full-resolution information gathered from the whole image. Thanks to
gradient checkpointing, the framework is trainable end-to-end with limited
memory resources and weak (image-level) supervision, allowing for the joint
optimization of all parameters. Experiments on widespread image forensics
datasets prove the good performance of the proposed approach, which largely
outperforms all baselines and all reference methods.Comment: 13 pages, 12 figures, journa
- …