726 research outputs found

    Modeling information cascades with self-exciting processes via generalized epidemic models

    Full text link
    © 2020 Association for Computing Machinery. Epidemic models and self-exciting processes are two types of models used to describe diffusion phenomena online and offline. These models were originally developed in different scientific communities, and their commonalities are under-explored. This work establishes, for the first time, a general connection between the two model classes via three new mathematical components. The first is a generalized version of stochastic Susceptible-Infected-Recovered (SIR) model with arbitrary recovery time distributions; the second is the relationship between the (latent and arbitrary) recovery time distribution, recovery hazard function, and the infection kernel of self-exciting processes; the third includes methods for simulating, fitting, evaluating and predicting the generalized process. On three large Twitter diffusion datasets, we conduct goodness-of-fit tests and holdout log-likelihood evaluation of self-exciting processes with three infection kernels — exponential, power-law and Tsallis Q-exponential. We show that the modeling performance of the infection kernels varies with respect to the temporal structures of diffusions, and also with respect to user behavior, such as the likelihood of being bots. We further improve the prediction of popularity by combining two models that are identified as complementary by the goodness-of-fit tests

    Linking Epidemic Models and Self-exciting Processes for Online and Offline Event Diffusions

    Get PDF
    Temporal diffusion data, which comprises time-stamped events, is ubiquitous, ranging from information diffusing in online social media platforms to infectious diseases spreading in offline communities. Pressing problems, such as predicting the popularity of online information and containing epidemics, demand temporal diffusion models for understanding, modeling, and controlling diffusion dynamics. This thesis discusses diffusions of online information and epidemics by developing and connecting self-exciting processes and epidemic models. First, we propose a novel dual mixture self-exciting process for characterizing online information diffusions related to online items, such as videos or news articles. By observing that maximum likelihood estimates are separable in a Hawkes process, the model, consisting of a Borel mixture model and a kernel mixture model, jointly learns the unfolding of a heterogeneous set of cascades. When applied to cascades of the same online items, the model directly characterizes their spread dynamics and provides interpretable quantities, such as content virality and content influence decay, as well as methods for predicting the final content popularities. On two retweet cascade datasets, we show that our models capture the differences between online items at the granularity of items, publishers, and categories. Next, we propose novel ideal strategies to explore the limits of both testing and contact tracing strategies, which have been shown effective in some epidemics (e.g., SARS) but ineffective in some others (e.g., COVID-19). We then develop a superspreading random contact network that accounts for the superspreading effect of infectious diseases, where several infected cases result in most secondary infections. In simulations, we observe gaps between ideal and standard strategies by examining extensive sets of epidemic parameters, highlighting the need to explore intelligent strategies. We also present a classification of different diseases based on how containable they are under different strategies. Then, we bridge epidemic models and self-exciting processes with a novel generalized stochastic Susceptible-Infected-Recovered (SIR) model with arbitrary recovery time distributions. We articulate the relationship between recovery time distributions, recovery hazard functions, and infection kernels of self-exciting processes. We also present methods for simulating, fitting, evaluating, and predicting the generalized process. On three large Twitter diffusion datasets, we show that the modeling performance of the infection kernels varies depending on the temporal structures of diffusions and user behavior, such as the likelihood of being bots. We further improve the prediction of popularity by combining two models identified as complementary in the goodness-of-fit tests. Last, we present evently, a tool for modeling online reshare cascades, particularly retweet cascades, using self-exciting processes. This tool fills in a gap between the practitioners of online social media analysis --- usually social, political, and communication scientists --- and the accessibility to tools capable of examining online discussions. It provides a comprehensive set of functionalities for processing raw data from Twitter public APIs, modeling the temporal dynamics of processed retweet cascades and characterizing online users with a wide range of diffusion measures. Overall, this thesis studies temporal diffusions of online information and epidemics by proposing novel epidemic models and self-exciting processes. It provides tools for predicting information popularities, characterizing online items, and classifying online item categories with state-of-the-art performances. It also contributes observations in applying testing and tracing strategies in containing epidemics. Lastly, evently facilitates temporal diffusion analysis for practitioners from various fields, such as social science and epidemiology

    Latent Self-Exciting Point Process Model for Spatial-Temporal Networks

    Full text link
    We propose a latent self-exciting point process model that describes geographically distributed interactions between pairs of entities. In contrast to most existing approaches that assume fully observable interactions, here we consider a scenario where certain interaction events lack information about participants. Instead, this information needs to be inferred from the available observations. We develop an efficient approximate algorithm based on variational expectation-maximization to infer unknown participants in an event given the location and the time of the event. We validate the model on synthetic as well as real-world data, and obtain very promising results on the identity-inference task. We also use our model to predict the timing and participants of future events, and demonstrate that it compares favorably with baseline approaches.Comment: 20 pages, 6 figures (v3); 11 pages, 6 figures (v2); previous version appeared in the 9th Bayesian Modeling Applications Workshop, UAI'1

    Communities, Knowledge Creation, and Information Diffusion

    Get PDF
    In this paper, we examine how patterns of scientific collaboration contribute to knowledge creation. Recent studies have shown that scientists can benefit from their position within collaborative networks by being able to receive more information of better quality in a timely fashion, and by presiding over communication between collaborators. Here we focus on the tendency of scientists to cluster into tightly-knit communities, and discuss the implications of this tendency for scientific performance. We begin by reviewing a new method for finding communities, and we then assess its benefits in terms of computation time and accuracy. While communities often serve as a taxonomic scheme to map knowledge domains, they also affect how successfully scientists engage in the creation of new knowledge. By drawing on the longstanding debate on the relative benefits of social cohesion and brokerage, we discuss the conditions that facilitate collaborations among scientists within or across communities. We show that successful scientific production occurs within communities when scientists have cohesive collaborations with others from the same knowledge domain, and across communities when scientists intermediate among otherwise disconnected collaborators from different knowledge domains. We also discuss the implications of communities for information diffusion, and show how traditional epidemiological approaches need to be refined to take knowledge heterogeneity into account and preserve the system's ability to promote creative processes of novel recombinations of idea

    Interval-censored Transformer Hawkes: Detecting Information Operations using the Reaction of Social Systems

    Full text link
    Social media is being increasingly weaponized by state-backed actors to elicit reactions, push narratives and sway public opinion. These are known as Information Operations (IO). The covert nature of IO makes their detection difficult. This is further amplified by missing data due to the user and content removal and privacy requirements. This work advances the hypothesis that the very reactions that Information Operations seek to elicit within the target social systems can be used to detect them. We propose an Interval-censored Transformer Hawkes (IC-TH) architecture and a novel data encoding scheme to account for both observed and missing data. We derive a novel log-likelihood function that we deploy together with a contrastive learning procedure. We showcase the performance of IC-TH on three real-world Twitter datasets and two learning tasks: future popularity prediction and item category prediction. The latter is particularly significant. Using the retweeting timing and patterns solely, we can predict the category of YouTube videos, guess whether news publishers are reputable or controversial and, most importantly, identify state-backed IO agent accounts. Additional qualitative investigations uncover that the automatically discovered clusters of Russian-backed agents appear to coordinate their behavior, activating simultaneously to push specific narratives

    Epidemic processes in complex networks

    Get PDF
    In recent years the research community has accumulated overwhelming evidence for the emergence of complex and heterogeneous connectivity patterns in a wide range of biological and sociotechnical systems. The complex properties of real-world networks have a profound impact on the behavior of equilibrium and nonequilibrium phenomena occurring in various systems, and the study of epidemic spreading is central to our understanding of the unfolding of dynamical processes in complex networks. The theoretical analysis of epidemic spreading in heterogeneous networks requires the development of novel analytical frameworks, and it has produced results of conceptual and practical relevance. A coherent and comprehensive review of the vast research activity concerning epidemic processes is presented, detailing the successful theoretical approaches as well as making their limits and assumptions clear. Physicists, mathematicians, epidemiologists, computer, and social scientists share a common interest in studying epidemic spreading and rely on similar models for the description of the diffusion of pathogens, knowledge, and innovation. For this reason, while focusing on the main results and the paradigmatic models in infectious disease modeling, the major results concerning generalized social contagion processes are also presented. Finally, the research activity at the forefront in the study of epidemic spreading in coevolving, coupled, and time-varying networks is reported.Comment: 62 pages, 15 figures, final versio
    • …
    corecore