268 research outputs found

    Scalable Privacy-Compliant Virality Prediction on Twitter

    Get PDF
    The digital town hall of Twitter becomes a preferred medium of communication for individuals and organizations across the globe. Some of them reach audiences of millions, while others struggle to get noticed. Given the impact of social media, the question remains more relevant than ever: how to model the dynamics of attention in Twitter. Researchers around the world turn to machine learning to predict the most influential tweets and authors, navigating the volume, velocity, and variety of social big data, with many compromises. In this paper, we revisit content popularity prediction on Twitter. We argue that strict alignment of data acquisition, storage and analysis algorithms is necessary to avoid the common trade-offs between scalability, accuracy and privacy compliance. We propose a new framework for the rapid acquisition of large-scale datasets, high accuracy supervisory signal and multilanguage sentiment prediction while respecting every privacy request applicable. We then apply a novel gradient boosting framework to achieve state-of-the-art results in virality ranking, already before including tweet's visual or propagation features. Our Gradient Boosted Regression Tree is the first to offer explainable, strong ranking performance on benchmark datasets. Since the analysis focused on features available early, the model is immediately applicable to incoming tweets in 18 languages.Comment: AffCon@AAAI-19 Best Paper Award; Presented at AAAI-19 W1: Affective Content Analysi

    Interactions in information spread: quantification and interpretation using stochastic block models

    Full text link
    In most real-world applications, it is seldom the case that a given observable evolves independently of its environment. In social networks, users' behavior results from the people they interact with, news in their feed, or trending topics. In natural language, the meaning of phrases emerges from the combination of words. In general medicine, a diagnosis is established on the basis of the interaction of symptoms. Here, we propose a new model, the Interactive Mixed Membership Stochastic Block Model (IMMSBM), which investigates the role of interactions between entities (hashtags, words, memes, etc.) and quantifies their importance within the aforementioned corpora. We find that interactions play an important role in those corpora. In inference tasks, taking them into account leads to average relative changes with respect to non-interactive models of up to 150\% in the probability of an outcome. Furthermore, their role greatly improves the predictive power of the model. Our findings suggest that neglecting interactions when modeling real-world phenomena might lead to incorrect conclusions being drawn.Comment: 17 pages, 3 figures, submitted to ECML-PKDD 202

    Predicting Virality on Networks Using Local Graphlet Frequency Distribution

    Get PDF
    The task of predicting virality has far-reaching consequences, from the world of advertising to more recent attempts to reduce the spread of fake news. Previous work has shown that graphlet distribution is an effective feature for predicting virality. Here, we investigate the use of aggregated edge-centric local graphlets around source nodes as features for virality prediction. These prediction features are used to predict expected virality for both a time-independent Hawkes model and an independent cascade model of virality. In the Hawkes model, we use linear regression to predict the number of Hawkes events and node ranking, while in the independent cascade model we use logistic regression to predict whether a k-size cascade will multiply by a factor X in size. Our study indicates that local graphlet frequency distribution can effectively capture the variances of the viral processes simulated by Hawkes process and independent-cascade process. Furthermore, we identify a group of local graphlets which might be significant in the viral processes. We compare the effectiveness of our methods with eigenvector centrality-based node choice
    • …
    corecore