726 research outputs found
Modeling information cascades with self-exciting processes via generalized epidemic models
© 2020 Association for Computing Machinery. Epidemic models and self-exciting processes are two types of models used to describe diffusion phenomena online and offline. These models were originally developed in different scientific communities, and their commonalities are under-explored. This work establishes, for the first time, a general connection between the two model classes via three new mathematical components. The first is a generalized version of stochastic Susceptible-Infected-Recovered (SIR) model with arbitrary recovery time distributions; the second is the relationship between the (latent and arbitrary) recovery time distribution, recovery hazard function, and the infection kernel of self-exciting processes; the third includes methods for simulating, fitting, evaluating and predicting the generalized process. On three large Twitter diffusion datasets, we conduct goodness-of-fit tests and holdout log-likelihood evaluation of self-exciting processes with three infection kernels — exponential, power-law and Tsallis Q-exponential. We show that the modeling performance of the infection kernels varies with respect to the temporal structures of diffusions, and also with respect to user behavior, such as the likelihood of being bots. We further improve the prediction of popularity by combining two models that are identified as complementary by the goodness-of-fit tests
Linking Epidemic Models and Self-exciting Processes for Online and Offline Event Diffusions
Temporal diffusion data, which comprises time-stamped events, is ubiquitous, ranging from information diffusing in online social media platforms to infectious diseases spreading in offline communities. Pressing problems, such as predicting the popularity of online information and containing epidemics, demand temporal diffusion models for understanding, modeling, and controlling diffusion dynamics. This thesis discusses diffusions of online information and epidemics by developing and connecting self-exciting processes and epidemic models. First, we propose a novel dual mixture self-exciting process for characterizing online information diffusions related to online items, such as videos or news articles. By observing that maximum likelihood estimates are separable in a Hawkes process, the model, consisting of a Borel mixture model and a kernel mixture model, jointly learns the unfolding of a heterogeneous set of cascades. When applied to cascades of the same online items, the model directly characterizes their spread dynamics and provides interpretable quantities, such as content virality and content influence decay, as well as methods for predicting the final content popularities. On two retweet cascade datasets, we show that our models capture the differences between online items at the granularity of items, publishers, and categories. Next, we propose novel ideal strategies to explore the limits of both testing and contact tracing strategies, which have been shown effective in some epidemics (e.g., SARS) but ineffective in some others (e.g., COVID-19). We then develop a superspreading random contact network that accounts for the superspreading effect of infectious diseases, where several infected cases result in most secondary infections. In simulations, we observe gaps between ideal and standard strategies by examining extensive sets of epidemic parameters, highlighting the need to explore intelligent strategies. We also present a classification of different diseases based on how containable they are under different strategies. Then, we bridge epidemic models and self-exciting processes with a novel generalized stochastic Susceptible-Infected-Recovered (SIR) model with arbitrary recovery time distributions. We articulate the relationship between recovery time distributions, recovery hazard functions, and infection kernels of self-exciting processes. We also present methods for simulating, fitting, evaluating, and predicting the generalized process. On three large Twitter diffusion datasets, we show that the modeling performance of the infection kernels varies depending on the temporal structures of diffusions and user behavior, such as the likelihood of being bots. We further improve the prediction of popularity by combining two models identified as complementary in the goodness-of-fit tests. Last, we present evently, a tool for modeling online reshare cascades, particularly retweet cascades, using self-exciting processes. This tool fills in a gap between the practitioners of online social media analysis --- usually social, political, and communication scientists --- and the accessibility to tools capable of examining online discussions. It provides a comprehensive set of functionalities for processing raw data from Twitter public APIs, modeling the temporal dynamics of processed retweet cascades and characterizing online users with a wide range of diffusion measures. Overall, this thesis studies temporal diffusions of online information and epidemics by proposing novel epidemic models and self-exciting processes. It provides tools for predicting information popularities, characterizing online items, and classifying online item categories with state-of-the-art performances. It also contributes observations in applying testing and tracing strategies in containing epidemics. Lastly, evently facilitates temporal diffusion analysis for practitioners from various fields, such as social science and epidemiology
Latent Self-Exciting Point Process Model for Spatial-Temporal Networks
We propose a latent self-exciting point process model that describes
geographically distributed interactions between pairs of entities. In contrast
to most existing approaches that assume fully observable interactions, here we
consider a scenario where certain interaction events lack information about
participants. Instead, this information needs to be inferred from the available
observations. We develop an efficient approximate algorithm based on
variational expectation-maximization to infer unknown participants in an event
given the location and the time of the event. We validate the model on
synthetic as well as real-world data, and obtain very promising results on the
identity-inference task. We also use our model to predict the timing and
participants of future events, and demonstrate that it compares favorably with
baseline approaches.Comment: 20 pages, 6 figures (v3); 11 pages, 6 figures (v2); previous version
appeared in the 9th Bayesian Modeling Applications Workshop, UAI'1
Communities, Knowledge Creation, and Information Diffusion
In this paper, we examine how patterns of scientific collaboration contribute
to knowledge creation. Recent studies have shown that scientists can benefit
from their position within collaborative networks by being able to receive more
information of better quality in a timely fashion, and by presiding over
communication between collaborators. Here we focus on the tendency of
scientists to cluster into tightly-knit communities, and discuss the
implications of this tendency for scientific performance. We begin by reviewing
a new method for finding communities, and we then assess its benefits in terms
of computation time and accuracy. While communities often serve as a taxonomic
scheme to map knowledge domains, they also affect how successfully scientists
engage in the creation of new knowledge. By drawing on the longstanding debate
on the relative benefits of social cohesion and brokerage, we discuss the
conditions that facilitate collaborations among scientists within or across
communities. We show that successful scientific production occurs within
communities when scientists have cohesive collaborations with others from the
same knowledge domain, and across communities when scientists intermediate
among otherwise disconnected collaborators from different knowledge domains. We
also discuss the implications of communities for information diffusion, and
show how traditional epidemiological approaches need to be refined to take
knowledge heterogeneity into account and preserve the system's ability to
promote creative processes of novel recombinations of idea
Interval-censored Transformer Hawkes: Detecting Information Operations using the Reaction of Social Systems
Social media is being increasingly weaponized by state-backed actors to
elicit reactions, push narratives and sway public opinion. These are known as
Information Operations (IO). The covert nature of IO makes their detection
difficult. This is further amplified by missing data due to the user and
content removal and privacy requirements. This work advances the hypothesis
that the very reactions that Information Operations seek to elicit within the
target social systems can be used to detect them. We propose an
Interval-censored Transformer Hawkes (IC-TH) architecture and a novel data
encoding scheme to account for both observed and missing data. We derive a
novel log-likelihood function that we deploy together with a contrastive
learning procedure. We showcase the performance of IC-TH on three real-world
Twitter datasets and two learning tasks: future popularity prediction and item
category prediction. The latter is particularly significant. Using the
retweeting timing and patterns solely, we can predict the category of YouTube
videos, guess whether news publishers are reputable or controversial and, most
importantly, identify state-backed IO agent accounts. Additional qualitative
investigations uncover that the automatically discovered clusters of
Russian-backed agents appear to coordinate their behavior, activating
simultaneously to push specific narratives
Epidemic processes in complex networks
In recent years the research community has accumulated overwhelming evidence
for the emergence of complex and heterogeneous connectivity patterns in a wide
range of biological and sociotechnical systems. The complex properties of
real-world networks have a profound impact on the behavior of equilibrium and
nonequilibrium phenomena occurring in various systems, and the study of
epidemic spreading is central to our understanding of the unfolding of
dynamical processes in complex networks. The theoretical analysis of epidemic
spreading in heterogeneous networks requires the development of novel
analytical frameworks, and it has produced results of conceptual and practical
relevance. A coherent and comprehensive review of the vast research activity
concerning epidemic processes is presented, detailing the successful
theoretical approaches as well as making their limits and assumptions clear.
Physicists, mathematicians, epidemiologists, computer, and social scientists
share a common interest in studying epidemic spreading and rely on similar
models for the description of the diffusion of pathogens, knowledge, and
innovation. For this reason, while focusing on the main results and the
paradigmatic models in infectious disease modeling, the major results
concerning generalized social contagion processes are also presented. Finally,
the research activity at the forefront in the study of epidemic spreading in
coevolving, coupled, and time-varying networks is reported.Comment: 62 pages, 15 figures, final versio
- …