42 research outputs found
SEISMIC: A Self-Exciting Point Process Model for Predicting Tweet Popularity
Social networking websites allow users to create and share content. Big
information cascades of post resharing can form as users of these sites reshare
others' posts with their friends and followers. One of the central challenges
in understanding such cascading behaviors is in forecasting information
outbreaks, where a single post becomes widely popular by being reshared by many
users. In this paper, we focus on predicting the final number of reshares of a
given post. We build on the theory of self-exciting point processes to develop
a statistical model that allows us to make accurate predictions. Our model
requires no training or expensive feature engineering. It results in a simple
and efficiently computable formula that allows us to answer questions, in
real-time, such as: Given a post's resharing history so far, what is our
current estimate of its final number of reshares? Is the post resharing cascade
past the initial stage of explosive growth? And, which posts will be the most
reshared in the future? We validate our model using one month of complete
Twitter data and demonstrate a strong improvement in predictive accuracy over
existing approaches. Our model gives only 15% relative error in predicting
final size of an average information cascade after observing it for just one
hour.Comment: 10 pages, published in KDD 201
The Bursty Dynamics of the Twitter Information Network
In online social media systems users are not only posting, consuming, and
resharing content, but also creating new and destroying existing connections in
the underlying social network. While each of these two types of dynamics has
individually been studied in the past, much less is known about the connection
between the two. How does user information posting and seeking behavior
interact with the evolution of the underlying social network structure?
Here, we study ways in which network structure reacts to users posting and
sharing content. We examine the complete dynamics of the Twitter information
network, where users post and reshare information while they also create and
destroy connections. We find that the dynamics of network structure can be
characterized by steady rates of change, interrupted by sudden bursts.
Information diffusion in the form of cascades of post re-sharing often creates
such sudden bursts of new connections, which significantly change users' local
network structure. These bursts transform users' networks of followers to
become structurally more cohesive as well as more homogenous in terms of
follower interests. We also explore the effect of the information content on
the dynamics of the network and find evidence that the appearance of new topics
and real-world events can lead to significant changes in edge creations and
deletions. Lastly, we develop a model that quantifies the dynamics of the
network and the occurrence of these bursts as a function of the information
spreading through the network. The model can successfully predict which
information diffusion events will lead to bursts in network dynamics
Interval-censored Transformer Hawkes: Detecting Information Operations using the Reaction of Social Systems
Social media is being increasingly weaponized by state-backed actors to
elicit reactions, push narratives and sway public opinion. These are known as
Information Operations (IO). The covert nature of IO makes their detection
difficult. This is further amplified by missing data due to the user and
content removal and privacy requirements. This work advances the hypothesis
that the very reactions that Information Operations seek to elicit within the
target social systems can be used to detect them. We propose an
Interval-censored Transformer Hawkes (IC-TH) architecture and a novel data
encoding scheme to account for both observed and missing data. We derive a
novel log-likelihood function that we deploy together with a contrastive
learning procedure. We showcase the performance of IC-TH on three real-world
Twitter datasets and two learning tasks: future popularity prediction and item
category prediction. The latter is particularly significant. Using the
retweeting timing and patterns solely, we can predict the category of YouTube
videos, guess whether news publishers are reputable or controversial and, most
importantly, identify state-backed IO agent accounts. Additional qualitative
investigations uncover that the automatically discovered clusters of
Russian-backed agents appear to coordinate their behavior, activating
simultaneously to push specific narratives
Causal Understanding of Why Users Share Hate Speech on Social Media
Hate speech on social media threatens the mental and physical well-being of
individuals and is further responsible for real-world violence. An important
driver behind the spread of hate speech and thus why hateful posts can go viral
are reshares, yet little is known about why users reshare hate speech. In this
paper, we present a comprehensive, causal analysis of the user attributes that
make users reshare hate speech. However, causal inference from observational
social media data is challenging, because such data likely suffer from
selection bias, and there is further confounding due to differences in the
vulnerability of users to hate speech. We develop a novel, three-step causal
framework: (1) We debias the observational social media data by applying
inverse propensity scoring. (2) We use the debiased propensity scores to model
the latent vulnerability of users to hate speech as a latent embedding. (3) We
model the causal effects of user attributes on users' probability of sharing
hate speech, while controlling for the latent vulnerability of users to hate
speech. Compared to existing baselines, a particular strength of our framework
is that it models causal effects that are non-linear, yet still explainable. We
find that users with fewer followers, fewer friends, and fewer posts share more
hate speech. Younger accounts, in return, share less hate speech. Overall,
understanding the factors that drive users to share hate speech is crucial for
detecting individuals at risk of engaging in harmful behavior and for designing
effective mitigation strategies
Birdspotter: A Tool for Analyzing and Labeling Twitter Users
The impact of online social media on societal events and institutions is
profound; and with the rapid increases in user uptake, we are just starting to
understand its ramifications. Social scientists and practitioners who model
online discourse as a proxy for real-world behavior, often curate large social
media datasets. A lack of available tooling aimed at non-data science experts
frequently leaves this data (and the insights it holds) underutilized. Here, we
propose birdspotter -- a tool to analyze and label Twitter users --, and
birdspotter.ml -- an exploratory visualizer for the computed metrics.
birdspotter provides an end-to-end analysis pipeline, from the processing of
pre-collected Twitter data, to general-purpose labeling of users, and
estimating their social influence, within a few lines of code. The package
features tutorials and detailed documentation. We also illustrate how to train
birdspotter into a fully-fledged bot detector that achieves better than
state-of-the-art performances without making any Twitter API online calls, and
we showcase its usage in an exploratory analysis of a topical COVID-19 dataset
Fighting Rumours on Social Media
With the advance of social platforms, people are sharing contents in an unprecedented scale. This makes social platforms an ideal place for spreading rumors. As rumors may have negative impacts on the real world, many rumor detection techniques have been proposed. In this proposal, we summarize several works that focus on two important steps of rumor detection. The first step involves detecting controversial events from the data streams which are candidates for rumors. The aim of the second step is to find out the truth values of these events i.e. whether they are rumors or not. Although some techniques are able to achieve state-of-the-art results, they do not cope well with the streaming nature of social platforms. In addition, they usually leverage only one type of information available on social platforms such as only the posts. To overcome these limitations, we propose two research directions that emphasize on 1) detecting rumors in a progressive manner and 2) combining different types of information for better detection