7,045 research outputs found
The Pulse of News in Social Media: Forecasting Popularity
News articles are extremely time sensitive by nature. There is also intense
competition among news items to propagate as widely as possible. Hence, the
task of predicting the popularity of news items on the social web is both
interesting and challenging. Prior research has dealt with predicting eventual
online popularity based on early popularity. It is most desirable, however, to
predict the popularity of items prior to their release, fostering the
possibility of appropriate decision making to modify an article and the manner
of its publication. In this paper, we construct a multi-dimensional feature
space derived from properties of an article and evaluate the efficacy of these
features to serve as predictors of online popularity. We examine both
regression and classification algorithms and demonstrate that despite
randomness in human behavior, it is possible to predict ranges of popularity on
twitter with an overall 84% accuracy. Our study also serves to illustrate the
differences between traditionally prominent sources and those immensely popular
on the social web
SEISMIC: A Self-Exciting Point Process Model for Predicting Tweet Popularity
Social networking websites allow users to create and share content. Big
information cascades of post resharing can form as users of these sites reshare
others' posts with their friends and followers. One of the central challenges
in understanding such cascading behaviors is in forecasting information
outbreaks, where a single post becomes widely popular by being reshared by many
users. In this paper, we focus on predicting the final number of reshares of a
given post. We build on the theory of self-exciting point processes to develop
a statistical model that allows us to make accurate predictions. Our model
requires no training or expensive feature engineering. It results in a simple
and efficiently computable formula that allows us to answer questions, in
real-time, such as: Given a post's resharing history so far, what is our
current estimate of its final number of reshares? Is the post resharing cascade
past the initial stage of explosive growth? And, which posts will be the most
reshared in the future? We validate our model using one month of complete
Twitter data and demonstrate a strong improvement in predictive accuracy over
existing approaches. Our model gives only 15% relative error in predicting
final size of an average information cascade after observing it for just one
hour.Comment: 10 pages, published in KDD 201
Infectivity Enhances Prediction of Viral Cascades in Twitter
Models of contagion dynamics, originally developed for infectious diseases,
have proven relevant to the study of information, news, and political opinions
in online social systems. Modelling diffusion processes and predicting viral
information cascades are important problems in network science. Yet, many
studies of information cascades neglect the variation in infectivity across
different pieces of information. Here, we employ early-time observations of
online cascades to estimate the infectivity of distinct pieces of information.
Using simulations and data from real-world Twitter retweets, we demonstrate
that these estimated infectivities can be used to improve predictions about the
virality of an information cascade. Developing our simulations to mimic the
real-world data, we consider the effect of the limited effective time for
transmission of a cascade and demonstrate that a simple model for slow but
non-negligible decay of the infectivity captures the essential properties of
retweet distributions. These results demonstrate the interplay between the
intrinsic infectivity of a tweet and the complex network environment within
which it diffuses, strongly influencing the likelihood of becoming a viral
cascade.Comment: 16 pages, 10 figure
The Bursty Dynamics of the Twitter Information Network
In online social media systems users are not only posting, consuming, and
resharing content, but also creating new and destroying existing connections in
the underlying social network. While each of these two types of dynamics has
individually been studied in the past, much less is known about the connection
between the two. How does user information posting and seeking behavior
interact with the evolution of the underlying social network structure?
Here, we study ways in which network structure reacts to users posting and
sharing content. We examine the complete dynamics of the Twitter information
network, where users post and reshare information while they also create and
destroy connections. We find that the dynamics of network structure can be
characterized by steady rates of change, interrupted by sudden bursts.
Information diffusion in the form of cascades of post re-sharing often creates
such sudden bursts of new connections, which significantly change users' local
network structure. These bursts transform users' networks of followers to
become structurally more cohesive as well as more homogenous in terms of
follower interests. We also explore the effect of the information content on
the dynamics of the network and find evidence that the appearance of new topics
and real-world events can lead to significant changes in edge creations and
deletions. Lastly, we develop a model that quantifies the dynamics of the
network and the occurrence of these bursts as a function of the information
spreading through the network. The model can successfully predict which
information diffusion events will lead to bursts in network dynamics
Scalable Privacy-Compliant Virality Prediction on Twitter
The digital town hall of Twitter becomes a preferred medium of communication
for individuals and organizations across the globe. Some of them reach
audiences of millions, while others struggle to get noticed. Given the impact
of social media, the question remains more relevant than ever: how to model the
dynamics of attention in Twitter. Researchers around the world turn to machine
learning to predict the most influential tweets and authors, navigating the
volume, velocity, and variety of social big data, with many compromises. In
this paper, we revisit content popularity prediction on Twitter. We argue that
strict alignment of data acquisition, storage and analysis algorithms is
necessary to avoid the common trade-offs between scalability, accuracy and
privacy compliance. We propose a new framework for the rapid acquisition of
large-scale datasets, high accuracy supervisory signal and multilanguage
sentiment prediction while respecting every privacy request applicable. We then
apply a novel gradient boosting framework to achieve state-of-the-art results
in virality ranking, already before including tweet's visual or propagation
features. Our Gradient Boosted Regression Tree is the first to offer
explainable, strong ranking performance on benchmark datasets. Since the
analysis focused on features available early, the model is immediately
applicable to incoming tweets in 18 languages.Comment: AffCon@AAAI-19 Best Paper Award; Presented at AAAI-19 W1: Affective
Content Analysi
- …