6 research outputs found

    Will This Video Go Viral? Explaining and Predicting the Popularity of Youtube Videos

    Full text link
    What makes content go viral? Which videos become popular and why others don't? Such questions have elicited significant attention from both researchers and industry, particularly in the context of online media. A range of models have been recently proposed to explain and predict popularity; however, there is a short supply of practical tools, accessible for regular users, that leverage these theoretical results. HIPie -- an interactive visualization system -- is created to fill this gap, by enabling users to reason about the virality and the popularity of online videos. It retrieves the metadata and the past popularity series of Youtube videos, it employs Hawkes Intensity Process, a state-of-the-art online popularity model for explaining and predicting video popularity, and it presents videos comparatively in a series of interactive plots. This system will help both content consumers and content producers in a range of data-driven inquiries, such as to comparatively analyze videos and channels, to explain and predict future popularity, to identify viral videos, and to estimate response to online promotion.Comment: 4 page

    Linking Epidemic Models and Self-exciting Processes for Online and Offline Event Diffusions

    Get PDF
    Temporal diffusion data, which comprises time-stamped events, is ubiquitous, ranging from information diffusing in online social media platforms to infectious diseases spreading in offline communities. Pressing problems, such as predicting the popularity of online information and containing epidemics, demand temporal diffusion models for understanding, modeling, and controlling diffusion dynamics. This thesis discusses diffusions of online information and epidemics by developing and connecting self-exciting processes and epidemic models. First, we propose a novel dual mixture self-exciting process for characterizing online information diffusions related to online items, such as videos or news articles. By observing that maximum likelihood estimates are separable in a Hawkes process, the model, consisting of a Borel mixture model and a kernel mixture model, jointly learns the unfolding of a heterogeneous set of cascades. When applied to cascades of the same online items, the model directly characterizes their spread dynamics and provides interpretable quantities, such as content virality and content influence decay, as well as methods for predicting the final content popularities. On two retweet cascade datasets, we show that our models capture the differences between online items at the granularity of items, publishers, and categories. Next, we propose novel ideal strategies to explore the limits of both testing and contact tracing strategies, which have been shown effective in some epidemics (e.g., SARS) but ineffective in some others (e.g., COVID-19). We then develop a superspreading random contact network that accounts for the superspreading effect of infectious diseases, where several infected cases result in most secondary infections. In simulations, we observe gaps between ideal and standard strategies by examining extensive sets of epidemic parameters, highlighting the need to explore intelligent strategies. We also present a classification of different diseases based on how containable they are under different strategies. Then, we bridge epidemic models and self-exciting processes with a novel generalized stochastic Susceptible-Infected-Recovered (SIR) model with arbitrary recovery time distributions. We articulate the relationship between recovery time distributions, recovery hazard functions, and infection kernels of self-exciting processes. We also present methods for simulating, fitting, evaluating, and predicting the generalized process. On three large Twitter diffusion datasets, we show that the modeling performance of the infection kernels varies depending on the temporal structures of diffusions and user behavior, such as the likelihood of being bots. We further improve the prediction of popularity by combining two models identified as complementary in the goodness-of-fit tests. Last, we present evently, a tool for modeling online reshare cascades, particularly retweet cascades, using self-exciting processes. This tool fills in a gap between the practitioners of online social media analysis --- usually social, political, and communication scientists --- and the accessibility to tools capable of examining online discussions. It provides a comprehensive set of functionalities for processing raw data from Twitter public APIs, modeling the temporal dynamics of processed retweet cascades and characterizing online users with a wide range of diffusion measures. Overall, this thesis studies temporal diffusions of online information and epidemics by proposing novel epidemic models and self-exciting processes. It provides tools for predicting information popularities, characterizing online items, and classifying online item categories with state-of-the-art performances. It also contributes observations in applying testing and tracing strategies in containing epidemics. Lastly, evently facilitates temporal diffusion analysis for practitioners from various fields, such as social science and epidemiology

    Interval-censored Transformer Hawkes: Detecting Information Operations using the Reaction of Social Systems

    Full text link
    Social media is being increasingly weaponized by state-backed actors to elicit reactions, push narratives and sway public opinion. These are known as Information Operations (IO). The covert nature of IO makes their detection difficult. This is further amplified by missing data due to the user and content removal and privacy requirements. This work advances the hypothesis that the very reactions that Information Operations seek to elicit within the target social systems can be used to detect them. We propose an Interval-censored Transformer Hawkes (IC-TH) architecture and a novel data encoding scheme to account for both observed and missing data. We derive a novel log-likelihood function that we deploy together with a contrastive learning procedure. We showcase the performance of IC-TH on three real-world Twitter datasets and two learning tasks: future popularity prediction and item category prediction. The latter is particularly significant. Using the retweeting timing and patterns solely, we can predict the category of YouTube videos, guess whether news publishers are reputable or controversial and, most importantly, identify state-backed IO agent accounts. Additional qualitative investigations uncover that the automatically discovered clusters of Russian-backed agents appear to coordinate their behavior, activating simultaneously to push specific narratives

    Slipping to the Extreme: A Mixed Method to Explain How Extreme Opinions Infiltrate Online Discussions

    Full text link
    Qualitative research provides methodological guidelines for observing and studying communities and cultures on online social media platforms. However, such methods demand considerable manual effort from researchers and can be overly focused and narrowed to certain online groups. This work proposes a complete solution to accelerate the qualitative analysis of problematic online speech, focusing on opinions emerging from online communities by leveraging machine learning algorithms. First, we employ qualitative methods of deep observation for understanding problematic online speech. This initial qualitative study constructs an ontology of problematic speech, which contains social media postings annotated with their underlying opinions. The qualitative study dynamically constructs the set of opinions, simultaneous with labeling the postings. Next, we use keywords to collect a large dataset from three online social media platforms (Facebook, Twitter, and Youtube). Finally, we introduce an iterative data exploration procedure to augment the dataset. It alternates between a data sampler -- which balances exploration and exploitation of unlabeled data -- the automatic labeling of the sampled data, the manual inspection by the qualitative mapping team, and, finally, the retraining of the automatic opinion classifiers. We present both qualitative and quantitative results. First, we show that our human-in-the-loop method successfully augments the initial qualitatively labeled and narrowly focused dataset and constructs a more encompassing dataset. Next, we present detailed case studies of the dynamics of problematic speech in a far-right Facebook group, exemplifying its mutation from conservative to extreme. Finally, we examine the dynamics of opinion emergence and co-occurrence, and we hint at some pathways through which extreme opinions creep into the mainstream online discourse.Comment: ICWSM 202
    corecore