2,024 research outputs found
Weak Supervision for Fake News Detection via Reinforcement Learning
Today social media has become the primary source for news. Via social media
platforms, fake news travel at unprecedented speeds, reach global audiences and
put users and communities at great risk. Therefore, it is extremely important
to detect fake news as early as possible. Recently, deep learning based
approaches have shown improved performance in fake news detection. However, the
training of such models requires a large amount of labeled data, but manual
annotation is time-consuming and expensive. Moreover, due to the dynamic nature
of news, annotated samples may become outdated quickly and cannot represent the
news articles on newly emerged events. Therefore, how to obtain fresh and
high-quality labeled samples is the major challenge in employing deep learning
models for fake news detection. In order to tackle this challenge, we propose a
reinforced weakly-supervised fake news detection framework, i.e., WeFEND, which
can leverage users' reports as weak supervision to enlarge the amount of
training data for fake news detection. The proposed framework consists of three
main components: the annotator, the reinforced selector and the fake news
detector. The annotator can automatically assign weak labels for unlabeled news
based on users' reports. The reinforced selector using reinforcement learning
techniques chooses high-quality samples from the weakly labeled data and
filters out those low-quality ones that may degrade the detector's prediction
performance. The fake news detector aims to identify fake news based on the
news content. We tested the proposed framework on a large collection of news
articles published via WeChat official accounts and associated user reports.
Extensive experiments on this dataset show that the proposed WeFEND model
achieves the best performance compared with the state-of-the-art methods.Comment: AAAI 202
Combating Misinformation in the Age of LLMs: Opportunities and Challenges
Misinformation such as fake news and rumors is a serious threat on
information ecosystems and public trust. The emergence of Large Language Models
(LLMs) has great potential to reshape the landscape of combating
misinformation. Generally, LLMs can be a double-edged sword in the fight. On
the one hand, LLMs bring promising opportunities for combating misinformation
due to their profound world knowledge and strong reasoning abilities. Thus, one
emergent question is: how to utilize LLMs to combat misinformation? On the
other hand, the critical challenge is that LLMs can be easily leveraged to
generate deceptive misinformation at scale. Then, another important question
is: how to combat LLM-generated misinformation? In this paper, we first
systematically review the history of combating misinformation before the advent
of LLMs. Then we illustrate the current efforts and present an outlook for
these two fundamental questions respectively. The goal of this survey paper is
to facilitate the progress of utilizing LLMs for fighting misinformation and
call for interdisciplinary efforts from different stakeholders for combating
LLM-generated misinformation.Comment: 9 pages for the main paper, 35 pages including 656 references, more
resources on "LLMs Meet Misinformation" are on the website:
https://llm-misinformation.github.io
Misinformation Containment Using NLP and Machine Learning: Why the Problem Is Still Unsolved
Despite the increased attention and substantial research into it claiming outstanding successes, the problem of misinformation containment has only been growing in the recent years with not many signs of respite. Misinformation is rapidly changing its latent characteristics and spreading vigorously in a multi-modal fashion, sometimes in a more damaging manner than viruses and other malicious programs on the internet. This chapter examines the existing research in natural language processing and machine learning to stop the spread of misinformation, analyzes why the research has not been practical enough to be incorporated into social media platforms, and provides future research directions. The state-of-the-art feature engineering, approaches, and algorithms used for the problem are expounded in the process
All the wiser: Fake news intervention using user reading preferences
National Research Foundation (NRF) Singapore under International Research Centres in Singapore Funding Initiativ
Detecting Misinformation with LLM-Predicted Credibility Signals and Weak Supervision
Credibility signals represent a wide range of heuristics that are typically
used by journalists and fact-checkers to assess the veracity of online content.
Automating the task of credibility signal extraction, however, is very
challenging as it requires high-accuracy signal-specific extractors to be
trained, while there are currently no sufficiently large datasets annotated
with all credibility signals. This paper investigates whether large language
models (LLMs) can be prompted effectively with a set of 18 credibility signals
to produce weak labels for each signal. We then aggregate these potentially
noisy labels using weak supervision in order to predict content veracity. We
demonstrate that our approach, which combines zero-shot LLM credibility signal
labeling and weak supervision, outperforms state-of-the-art classifiers on two
misinformation datasets without using any ground-truth labels for training. We
also analyse the contribution of the individual credibility signals towards
predicting content veracity, which provides new valuable insights into their
role in misinformation detection
Machine Learning-based Automatic Annotation and Detection of COVID-19 Fake News
COVID-19 impacted every part of the world, although the misinformation about
the outbreak traveled faster than the virus. Misinformation spread through
online social networks (OSN) often misled people from following correct medical
practices. In particular, OSN bots have been a primary source of disseminating
false information and initiating cyber propaganda. Existing work neglects the
presence of bots that act as a catalyst in the spread and focuses on fake news
detection in 'articles shared in posts' rather than the post (textual) content.
Most work on misinformation detection uses manually labeled datasets that are
hard to scale for building their predictive models. In this research, we
overcome this challenge of data scarcity by proposing an automated approach for
labeling data using verified fact-checked statements on a Twitter dataset. In
addition, we combine textual features with user-level features (such as
followers count and friends count) and tweet-level features (such as number of
mentions, hashtags and urls in a tweet) to act as additional indicators to
detect misinformation. Moreover, we analyzed the presence of bots in tweets and
show that bots change their behavior over time and are most active during the
misinformation campaign. We collected 10.22 Million COVID-19 related tweets and
used our annotation model to build an extensive and original ground truth
dataset for classification purposes. We utilize various machine learning models
to accurately detect misinformation and our best classification model achieves
precision (82%), recall (96%), and false positive rate (3.58%). Also, our bot
analysis indicates that bots generated approximately 10% of misinformation
tweets. Our methodology results in substantial exposure of false information,
thus improving the trustworthiness of information disseminated through social
media platforms
DEAP-FAKED: Knowledge Graph based Approach for Fake News Detection
Fake News on social media platforms has attracted a lot of attention in
recent times, primarily for events related to politics (2016 US Presidential
elections), healthcare (infodemic during COVID-19), to name a few. Various
methods have been proposed for detecting Fake News. The approaches span from
exploiting techniques related to network analysis, Natural Language Processing
(NLP), and the usage of Graph Neural Networks (GNNs). In this work, we propose
DEAP-FAKED, a knowleDgE grAPh FAKe nEws Detection framework for identifying
Fake News. Our approach is a combination of the NLP -- where we encode the news
content, and the GNN technique -- where we encode the Knowledge Graph (KG). A
variety of these encodings provides a complementary advantage to our detector.
We evaluate our framework using two publicly available datasets containing
articles from domains such as politics, business, technology, and healthcare.
As part of dataset pre-processing, we also remove the bias, such as the source
of the articles, which could impact the performance of the models. DEAP-FAKED
obtains an F1-score of 88% and 78% for the two datasets, which is an
improvement of 21%, and 3% respectively, which shows the effectiveness of the
approach.Comment: Accepted at IEEE/ACM International Conference on Advances in Social
Networks Analysis and Mining (ASONAM) 202
No Place to Hide: Dual Deep Interaction Channel Network for Fake News Detection based on Data Augmentation
Online Social Network (OSN) has become a hotbed of fake news due to the low
cost of information dissemination. Although the existing methods have made many
attempts in news content and propagation structure, the detection of fake news
is still facing two challenges: one is how to mine the unique key features and
evolution patterns, and the other is how to tackle the problem of small samples
to build the high-performance model. Different from popular methods which take
full advantage of the propagation topology structure, in this paper, we propose
a novel framework for fake news detection from perspectives of semantic,
emotion and data enhancement, which excavates the emotional evolution patterns
of news participants during the propagation process, and a dual deep
interaction channel network of semantic and emotion is designed to obtain a
more comprehensive and fine-grained news representation with the consideration
of comments. Meanwhile, the framework introduces a data enhancement module to
obtain more labeled data with high quality based on confidence which further
improves the performance of the classification model. Experiments show that the
proposed approach outperforms the state-of-the-art methods
- …