491 research outputs found
Probing Spurious Correlations in Popular Event-Based Rumor Detection Benchmarks
As social media becomes a hotbed for the spread of misinformation, the
crucial task of rumor detection has witnessed promising advances fostered by
open-source benchmark datasets. Despite being widely used, we find that these
datasets suffer from spurious correlations, which are ignored by existing
studies and lead to severe overestimation of existing rumor detection
performance. The spurious correlations stem from three causes: (1) event-based
data collection and labeling schemes assign the same veracity label to multiple
highly similar posts from the same underlying event; (2) merging multiple data
sources spuriously relates source identities to veracity labels; and (3)
labeling bias. In this paper, we closely investigate three of the most popular
rumor detection benchmark datasets (i.e., Twitter15, Twitter16 and PHEME), and
propose event-separated rumor detection as a solution to eliminate spurious
cues. Under the event-separated setting, we observe that the accuracy of
existing state-of-the-art models drops significantly by over 40%, becoming only
comparable to a simple neural classifier. To better address this task, we
propose Publisher Style Aggregation (PSA), a generalizable approach that
aggregates publisher posting records to learn writing style and veracity
stance. Extensive experiments demonstrate that our method outperforms existing
baselines in terms of effectiveness, efficiency and generalizability.Comment: Accepted to ECML-PKDD 202
Interactive Search and Exploration in Online Discussion Forums Using Multimodal Embeddings
In this paper we present a novel interactive multimodal learning system,
which facilitates search and exploration in large networks of social multimedia
users. It allows the analyst to identify and select users of interest, and to
find similar users in an interactive learning setting. Our approach is based on
novel multimodal representations of users, words and concepts, which we
simultaneously learn by deploying a general-purpose neural embedding model. We
show these representations to be useful not only for categorizing users, but
also for automatically generating user and community profiles. Inspired by
traditional summarization approaches, we create the profiles by selecting
diverse and representative content from all available modalities, i.e. the
text, image and user modality. The usefulness of the approach is evaluated
using artificial actors, which simulate user behavior in a relevance feedback
scenario. Multiple experiments were conducted in order to evaluate the quality
of our multimodal representations, to compare different embedding strategies,
and to determine the importance of different modalities. We demonstrate the
capabilities of the proposed approach on two different multimedia collections
originating from the violent online extremism forum Stormfront and the
microblogging platform Twitter, which are particularly interesting due to the
high semantic level of the discussions they feature
Hypergraph Neural Networks
In this paper, we present a hypergraph neural networks (HGNN) framework for
data representation learning, which can encode high-order data correlation in a
hypergraph structure. Confronting the challenges of learning representation for
complex data in real practice, we propose to incorporate such data structure in
a hypergraph, which is more flexible on data modeling, especially when dealing
with complex data. In this method, a hyperedge convolution operation is
designed to handle the data correlation during representation learning. In this
way, traditional hypergraph learning procedure can be conducted using hyperedge
convolution operations efficiently. HGNN is able to learn the hidden layer
representation considering the high-order data structure, which is a general
framework considering the complex data correlations. We have conducted
experiments on citation network classification and visual object recognition
tasks and compared HGNN with graph convolutional networks and other traditional
methods. Experimental results demonstrate that the proposed HGNN method
outperforms recent state-of-the-art methods. We can also reveal from the
results that the proposed HGNN is superior when dealing with multi-modal data
compared with existing methods.Comment: Accepted in AAAI'201
Improving Distributed Representations of Tweets - Present and Future
Unsupervised representation learning for tweets is an important research
field which helps in solving several business applications such as sentiment
analysis, hashtag prediction, paraphrase detection and microblog ranking. A
good tweet representation learning model must handle the idiosyncratic nature
of tweets which poses several challenges such as short length, informal words,
unusual grammar and misspellings. However, there is a lack of prior work which
surveys the representation learning models with a focus on tweets. In this
work, we organize the models based on its objective function which aids the
understanding of the literature. We also provide interesting future directions,
which we believe are fruitful in advancing this field by building high-quality
tweet representation learning models.Comment: To be presented in Student Research Workshop (SRW) at ACL 201
Improving Distributed Representations of Tweets - Present and Future
Unsupervised representation learning for tweets is an important research
field which helps in solving several business applications such as sentiment
analysis, hashtag prediction, paraphrase detection and microblog ranking. A
good tweet representation learning model must handle the idiosyncratic nature
of tweets which poses several challenges such as short length, informal words,
unusual grammar and misspellings. However, there is a lack of prior work which
surveys the representation learning models with a focus on tweets. In this
work, we organize the models based on its objective function which aids the
understanding of the literature. We also provide interesting future directions,
which we believe are fruitful in advancing this field by building high-quality
tweet representation learning models.Comment: To be presented in Student Research Workshop (SRW) at ACL 201
A Unified Contrastive Transfer Framework with Propagation Structure for Boosting Low-Resource Rumor Detection
The truth is significantly hampered by massive rumors that spread along with
breaking news or popular topics. Since there is sufficient corpus gathered from
the same domain for model training, existing rumor detection algorithms show
promising performance on yesterday's news. However, due to a lack of training
data and prior expert knowledge, they are poor at spotting rumors concerning
unforeseen events, especially those propagated in different languages (i.e.,
low-resource regimes). In this paper, we propose a unified contrastive transfer
framework to detect rumors by adapting the features learned from well-resourced
rumor data to that of the low-resourced. More specifically, we first represent
rumor circulated on social media as an undirected topology, and then train a
Multi-scale Graph Convolutional Network via a unified contrastive paradigm. Our
model explicitly breaks the barriers of the domain and/or language issues, via
language alignment and a novel domain-adaptive contrastive learning mechanism.
To enhance the representation learning from a small set of target events, we
reveal that rumor-indicative signal is closely correlated with the uniformity
of the distribution of these events. We design a target-wise contrastive
training mechanism with three data augmentation strategies, capable of unifying
the representations by distinguishing target events. Extensive experiments
conducted on four low-resource datasets collected from real-world microblog
platforms demonstrate that our framework achieves much better performance than
state-of-the-art methods and exhibits a superior capacity for detecting rumors
at early stages.Comment: A significant extension of the first contrastive approach for
low-resource rumor detection (arXiv:2204.08143
- …