268 research outputs found
Scalable Privacy-Compliant Virality Prediction on Twitter
The digital town hall of Twitter becomes a preferred medium of communication
for individuals and organizations across the globe. Some of them reach
audiences of millions, while others struggle to get noticed. Given the impact
of social media, the question remains more relevant than ever: how to model the
dynamics of attention in Twitter. Researchers around the world turn to machine
learning to predict the most influential tweets and authors, navigating the
volume, velocity, and variety of social big data, with many compromises. In
this paper, we revisit content popularity prediction on Twitter. We argue that
strict alignment of data acquisition, storage and analysis algorithms is
necessary to avoid the common trade-offs between scalability, accuracy and
privacy compliance. We propose a new framework for the rapid acquisition of
large-scale datasets, high accuracy supervisory signal and multilanguage
sentiment prediction while respecting every privacy request applicable. We then
apply a novel gradient boosting framework to achieve state-of-the-art results
in virality ranking, already before including tweet's visual or propagation
features. Our Gradient Boosted Regression Tree is the first to offer
explainable, strong ranking performance on benchmark datasets. Since the
analysis focused on features available early, the model is immediately
applicable to incoming tweets in 18 languages.Comment: AffCon@AAAI-19 Best Paper Award; Presented at AAAI-19 W1: Affective
Content Analysi
Interactions in information spread: quantification and interpretation using stochastic block models
In most real-world applications, it is seldom the case that a given
observable evolves independently of its environment. In social networks, users'
behavior results from the people they interact with, news in their feed, or
trending topics. In natural language, the meaning of phrases emerges from the
combination of words. In general medicine, a diagnosis is established on the
basis of the interaction of symptoms. Here, we propose a new model, the
Interactive Mixed Membership Stochastic Block Model (IMMSBM), which
investigates the role of interactions between entities (hashtags, words, memes,
etc.) and quantifies their importance within the aforementioned corpora. We
find that interactions play an important role in those corpora. In inference
tasks, taking them into account leads to average relative changes with respect
to non-interactive models of up to 150\% in the probability of an outcome.
Furthermore, their role greatly improves the predictive power of the model. Our
findings suggest that neglecting interactions when modeling real-world
phenomena might lead to incorrect conclusions being drawn.Comment: 17 pages, 3 figures, submitted to ECML-PKDD 202
Predicting Virality on Networks Using Local Graphlet Frequency Distribution
The task of predicting virality has far-reaching consequences, from the world of advertising to more recent attempts to reduce the spread of fake news. Previous work has shown that graphlet distribution is an effective feature for predicting virality. Here, we investigate the use of aggregated edge-centric local graphlets around source nodes as features for virality prediction. These prediction features are used to predict expected virality for both a time-independent Hawkes model and an independent cascade model of virality. In the Hawkes model, we use linear regression to predict the number of Hawkes events and node ranking, while in the independent cascade model we use logistic regression to predict whether a k-size cascade will multiply by a factor X in size. Our study indicates that local graphlet frequency distribution can effectively capture the variances of the viral processes simulated by Hawkes process and independent-cascade process. Furthermore, we identify a group of local graphlets which might be significant in the viral processes. We compare the effectiveness of our methods with eigenvector centrality-based node choice
- …