594 research outputs found
Evolving to the Future: Unseen Event Adaptive Fake News Detection on Social Media
With the rapid development of social media, the wide dissemination of fake
news on social media is increasingly threatening both individuals and society.
In the dynamic landscape of social media, fake news detection aims to develop a
model trained on news reporting past events. The objective is to predict and
identify fake news about future events, which often relate to subjects entirely
different from those in the past. However, existing fake detection methods
exhibit a lack of robustness and cannot generalize to unseen events. To address
this, we introduce Future ADaptive Event-based Fake news Detection (FADE)
framework. Specifically, we train a target predictor through an adaptive
augmentation strategy and graph contrastive learning to make more robust
overall predictions. Simultaneously, we independently train an event-only
predictor to obtain biased predictions. Then we further mitigate event bias by
obtaining the final prediction by subtracting the output of the event-only
predictor from the output of the target predictor. Encouraging results from
experiments designed to emulate real-world social media conditions validate the
effectiveness of our method in comparison to existing state-of-the-art
approaches
Predicting Viral Rumors and Vulnerable Users for Infodemic Surveillance
In the age of the infodemic, it is crucial to have tools for effectively
monitoring the spread of rampant rumors that can quickly go viral, as well as
identifying vulnerable users who may be more susceptible to spreading such
misinformation. This proactive approach allows for timely preventive measures
to be taken, mitigating the negative impact of false information on society. We
propose a novel approach to predict viral rumors and vulnerable users using a
unified graph neural network model. We pre-train network-based user embeddings
and leverage a cross-attention mechanism between users and posts, together with
a community-enhanced vulnerability propagation (CVP) method to improve user and
propagation graph representations. Furthermore, we employ two multi-task
training strategies to mitigate negative transfer effects among tasks in
different settings, enhancing the overall performance of our approach. We also
construct two datasets with ground-truth annotations on information virality
and user vulnerability in rumor and non-rumor events, which are automatically
derived from existing rumor detection datasets. Extensive evaluation results of
our joint learning model confirm its superiority over strong baselines in all
three tasks: rumor detection, virality prediction, and user vulnerability
scoring. For instance, compared to the best baselines based on the Weibo
dataset, our model makes 3.8\% and 3.0\% improvements on Accuracy and MacF1 for
rumor detection, and reduces mean squared error (MSE) by 23.9\% and 16.5\% for
virality prediction and user vulnerability scoring, respectively. Our findings
suggest that our approach effectively captures the correlation between rumor
virality and user vulnerability, leveraging this information to improve
prediction performance and provide a valuable tool for infodemic surveillance.Comment: Accepted by IP&
Detecting the Influence of Spreading in Social Networks with Excitable Sensor Networks
Detecting spreading outbreaks in social networks with sensors is of great
significance in applications. Inspired by the formation mechanism of human's
physical sensations to external stimuli, we propose a new method to detect the
influence of spreading by constructing excitable sensor networks. Exploiting
the amplifying effect of excitable sensor networks, our method can better
detect small-scale spreading processes. At the same time, it can also
distinguish large-scale diffusion instances due to the self-inhibition effect
of excitable elements. Through simulations of diverse spreading dynamics on
typical real-world social networks (facebook, coauthor and email social
networks), we find that the excitable senor networks are capable of detecting
and ranking spreading processes in a much wider range of influence than other
commonly used sensor placement methods, such as random, targeted, acquaintance
and distance strategies. In addition, we validate the efficacy of our method
with diffusion data from a real-world online social system, Twitter. We find
that our method can detect more spreading topics in practice. Our approach
provides a new direction in spreading detection and should be useful for
designing effective detection methods
Probing Spurious Correlations in Popular Event-Based Rumor Detection Benchmarks
As social media becomes a hotbed for the spread of misinformation, the
crucial task of rumor detection has witnessed promising advances fostered by
open-source benchmark datasets. Despite being widely used, we find that these
datasets suffer from spurious correlations, which are ignored by existing
studies and lead to severe overestimation of existing rumor detection
performance. The spurious correlations stem from three causes: (1) event-based
data collection and labeling schemes assign the same veracity label to multiple
highly similar posts from the same underlying event; (2) merging multiple data
sources spuriously relates source identities to veracity labels; and (3)
labeling bias. In this paper, we closely investigate three of the most popular
rumor detection benchmark datasets (i.e., Twitter15, Twitter16 and PHEME), and
propose event-separated rumor detection as a solution to eliminate spurious
cues. Under the event-separated setting, we observe that the accuracy of
existing state-of-the-art models drops significantly by over 40%, becoming only
comparable to a simple neural classifier. To better address this task, we
propose Publisher Style Aggregation (PSA), a generalizable approach that
aggregates publisher posting records to learn writing style and veracity
stance. Extensive experiments demonstrate that our method outperforms existing
baselines in terms of effectiveness, efficiency and generalizability.Comment: Accepted to ECML-PKDD 202
Temporal models for mining, ranking and recommendation in the Web
Due to their first-hand, diverse and evolution-aware reflection of nearly all areas of life, heterogeneous temporal datasets i.e., the Web, collaborative knowledge bases and social networks have been emerged as gold-mines for content analytics of many sorts. In those collections, time plays an essential role in many crucial information retrieval and data mining tasks, such as from user intent understanding, document ranking to advanced recommendations. There are two semantically closed
and important constituents when modeling along the time dimension, i.e., entity and event. Time is crucially served as the context for changes driven by happenings and phenomena (events) that related to people, organizations or places (so-called entities) in our social lives. Thus, determining what users expect, or in other words, resolving the uncertainty confounded by temporal changes is a compelling task to support consistent user satisfaction.
In this thesis, we address the aforementioned issues and propose temporal models that capture the temporal dynamics of such entities and events to serve for the end tasks. Specifically, we make the following contributions in this thesis:
(1) Query recommendation and document ranking in the Web - we address the issues for suggesting entity-centric queries and ranking effectiveness surrounding the happening time period of an associated event. In particular, we propose a multi-criteria optimization framework that facilitates the combination of multiple temporal models to smooth out the abrupt changes when transitioning between event phases for the former and a probabilistic approach for search result diversification of temporally ambiguous queries for the latter.
(2) Entity relatedness in Wikipedia - we study the long-term dynamics of Wikipedia as a global memory place for high-impact events, specifically the reviving memories of past events. Additionally, we propose a neural network-based approach to measure the temporal relatedness of entities and events. The model engages different latent representations of an entity (i.e., from time, link-based graph and content) and use the collective attention from user navigation as the supervision.
(3) Graph-based ranking and temporal anchor-text mining inWeb Archives - we tackle the problem of discovering important documents along the time-span ofWeb Archives, leveraging the link graph. Specifically, we combine the problems of relevance, temporal authority, diversity and time in a unified framework. The model accounts for the incomplete link structure and natural time lagging in Web Archives in mining the temporal authority.
(4) Methods for enhancing predictive models at early-stage in social media and clinical domain - we investigate several methods to control model instability and enrich contexts of predictive models at the “cold-start” period. We demonstrate their effectiveness for the rumor detection and blood glucose prediction cases respectively.
Overall, the findings presented in this thesis demonstrate the importance of tracking these temporal dynamics surround salient events and entities for IR applications. We show that determining such changes in time-based patterns and trends in prevalent temporal collections can better satisfy user expectations, and boost ranking and recommendation effectiveness over time
A Unified Contrastive Transfer Framework with Propagation Structure for Boosting Low-Resource Rumor Detection
The truth is significantly hampered by massive rumors that spread along with
breaking news or popular topics. Since there is sufficient corpus gathered from
the same domain for model training, existing rumor detection algorithms show
promising performance on yesterday's news. However, due to a lack of training
data and prior expert knowledge, they are poor at spotting rumors concerning
unforeseen events, especially those propagated in different languages (i.e.,
low-resource regimes). In this paper, we propose a unified contrastive transfer
framework to detect rumors by adapting the features learned from well-resourced
rumor data to that of the low-resourced. More specifically, we first represent
rumor circulated on social media as an undirected topology, and then train a
Multi-scale Graph Convolutional Network via a unified contrastive paradigm. Our
model explicitly breaks the barriers of the domain and/or language issues, via
language alignment and a novel domain-adaptive contrastive learning mechanism.
To enhance the representation learning from a small set of target events, we
reveal that rumor-indicative signal is closely correlated with the uniformity
of the distribution of these events. We design a target-wise contrastive
training mechanism with three data augmentation strategies, capable of unifying
the representations by distinguishing target events. Extensive experiments
conducted on four low-resource datasets collected from real-world microblog
platforms demonstrate that our framework achieves much better performance than
state-of-the-art methods and exhibits a superior capacity for detecting rumors
at early stages.Comment: A significant extension of the first contrastive approach for
low-resource rumor detection (arXiv:2204.08143
Graph Learning for Anomaly Analytics: Algorithms, Applications, and Challenges
Anomaly analytics is a popular and vital task in various research contexts,
which has been studied for several decades. At the same time, deep learning has
shown its capacity in solving many graph-based tasks like, node classification,
link prediction, and graph classification. Recently, many studies are extending
graph learning models for solving anomaly analytics problems, resulting in
beneficial advances in graph-based anomaly analytics techniques. In this
survey, we provide a comprehensive overview of graph learning methods for
anomaly analytics tasks. We classify them into four categories based on their
model architectures, namely graph convolutional network (GCN), graph attention
network (GAT), graph autoencoder (GAE), and other graph learning models. The
differences between these methods are also compared in a systematic manner.
Furthermore, we outline several graph-based anomaly analytics applications
across various domains in the real world. Finally, we discuss five potential
future research directions in this rapidly growing field
Graph Mining for Cybersecurity: A Survey
The explosive growth of cyber attacks nowadays, such as malware, spam, and
intrusions, caused severe consequences on society. Securing cyberspace has
become an utmost concern for organizations and governments. Traditional Machine
Learning (ML) based methods are extensively used in detecting cyber threats,
but they hardly model the correlations between real-world cyber entities. In
recent years, with the proliferation of graph mining techniques, many
researchers investigated these techniques for capturing correlations between
cyber entities and achieving high performance. It is imperative to summarize
existing graph-based cybersecurity solutions to provide a guide for future
studies. Therefore, as a key contribution of this paper, we provide a
comprehensive review of graph mining for cybersecurity, including an overview
of cybersecurity tasks, the typical graph mining techniques, and the general
process of applying them to cybersecurity, as well as various solutions for
different cybersecurity tasks. For each task, we probe into relevant methods
and highlight the graph types, graph approaches, and task levels in their
modeling. Furthermore, we collect open datasets and toolkits for graph-based
cybersecurity. Finally, we outlook the potential directions of this field for
future research
Graph learning for anomaly analytics : algorithms, applications, and challenges
Anomaly analytics is a popular and vital task in various research contexts that has been studied for several decades. At the same time, deep learning has shown its capacity in solving many graph-based tasks, like node classification, link prediction, and graph classification. Recently, many studies are extending graph learning models for solving anomaly analytics problems, resulting in beneficial advances in graph-based anomaly analytics techniques. In this survey, we provide a comprehensive overview of graph learning methods for anomaly analytics tasks. We classify them into four categories based on their model architectures, namely graph convolutional network, graph attention network, graph autoencoder, and other graph learning models. The differences between these methods are also compared in a systematic manner. Furthermore, we outline several graph-based anomaly analytics applications across various domains in the real world. Finally, we discuss five potential future research directions in this rapidly growing field. © 2023 Association for Computing Machinery
- …