849 research outputs found
The Bursty Dynamics of the Twitter Information Network
In online social media systems users are not only posting, consuming, and
resharing content, but also creating new and destroying existing connections in
the underlying social network. While each of these two types of dynamics has
individually been studied in the past, much less is known about the connection
between the two. How does user information posting and seeking behavior
interact with the evolution of the underlying social network structure?
Here, we study ways in which network structure reacts to users posting and
sharing content. We examine the complete dynamics of the Twitter information
network, where users post and reshare information while they also create and
destroy connections. We find that the dynamics of network structure can be
characterized by steady rates of change, interrupted by sudden bursts.
Information diffusion in the form of cascades of post re-sharing often creates
such sudden bursts of new connections, which significantly change users' local
network structure. These bursts transform users' networks of followers to
become structurally more cohesive as well as more homogenous in terms of
follower interests. We also explore the effect of the information content on
the dynamics of the network and find evidence that the appearance of new topics
and real-world events can lead to significant changes in edge creations and
deletions. Lastly, we develop a model that quantifies the dynamics of the
network and the occurrence of these bursts as a function of the information
spreading through the network. The model can successfully predict which
information diffusion events will lead to bursts in network dynamics
SEISMIC: A Self-Exciting Point Process Model for Predicting Tweet Popularity
Social networking websites allow users to create and share content. Big
information cascades of post resharing can form as users of these sites reshare
others' posts with their friends and followers. One of the central challenges
in understanding such cascading behaviors is in forecasting information
outbreaks, where a single post becomes widely popular by being reshared by many
users. In this paper, we focus on predicting the final number of reshares of a
given post. We build on the theory of self-exciting point processes to develop
a statistical model that allows us to make accurate predictions. Our model
requires no training or expensive feature engineering. It results in a simple
and efficiently computable formula that allows us to answer questions, in
real-time, such as: Given a post's resharing history so far, what is our
current estimate of its final number of reshares? Is the post resharing cascade
past the initial stage of explosive growth? And, which posts will be the most
reshared in the future? We validate our model using one month of complete
Twitter data and demonstrate a strong improvement in predictive accuracy over
existing approaches. Our model gives only 15% relative error in predicting
final size of an average information cascade after observing it for just one
hour.Comment: 10 pages, published in KDD 201
A Bayesian approach for predicting the popularity of tweets
We predict the popularity of short messages called tweets created in the
micro-blogging site known as Twitter. We measure the popularity of a tweet by
the time-series path of its retweets, which is when people forward the tweet to
others. We develop a probabilistic model for the evolution of the retweets
using a Bayesian approach, and form predictions using only observations on the
retweet times and the local network or "graph" structure of the retweeters. We
obtain good step ahead forecasts and predictions of the final total number of
retweets even when only a small fraction (i.e., less than one tenth) of the
retweet path is observed. This translates to good predictions within a few
minutes of a tweet being posted, and has potential implications for
understanding the spread of broader ideas, memes, or trends in social networks.Comment: Published in at http://dx.doi.org/10.1214/14-AOAS741 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Trends in Social Media : Persistence and Decay
Social media generates a prodigious wealth of real-time content at an
incessant rate. From all the content that people create and share, only a few
topics manage to attract enough attention to rise to the top and become
temporal trends which are displayed to users. The question of what factors
cause the formation and persistence of trends is an important one that has not
been answered yet. In this paper, we conduct an intensive study of trending
topics on Twitter and provide a theoretical basis for the formation,
persistence and decay of trends. We also demonstrate empirically how factors
such as user activity and number of followers do not contribute strongly to
trend creation and its propagation. In fact, we find that the resonance of the
content with the users of the social network plays a major role in causing
trends
Efficient Non-parametric Bayesian Hawkes Processes
In this paper, we develop an efficient nonparametric Bayesian estimation of
the kernel function of Hawkes processes. The non-parametric Bayesian approach
is important because it provides flexible Hawkes kernels and quantifies their
uncertainty. Our method is based on the cluster representation of Hawkes
processes. Utilizing the stationarity of the Hawkes process, we efficiently
sample random branching structures and thus, we split the Hawkes process into
clusters of Poisson processes. We derive two algorithms -- a block Gibbs
sampler and a maximum a posteriori estimator based on expectation maximization
-- and we show that our methods have a linear time complexity, both
theoretically and empirically. On synthetic data, we show our methods to be
able to infer flexible Hawkes triggering kernels. On two large-scale Twitter
diffusion datasets, we show that our methods outperform the current
state-of-the-art in goodness-of-fit and that the time complexity is linear in
the size of the dataset. We also observe that on diffusions related to online
videos, the learned kernels reflect the perceived longevity for different
content types such as music or pets videos
Modeling and predicting the popularity of online news based on temporal and content-related features
As the market of globally available online news is large and still growing, there is a strong competition between online publishers in order to reach the largest possible audience. Therefore an intelligent online publishing strategy is of the highest importance to publishers. A prerequisite for being able to optimize any online strategy, is to have trustworthy predictions of how popular new online content may become. This paper presents a novel methodology to model and predict the popularity of online news. We first introduce a new strategy and mathematical model to capture view patterns of online news. After a thorough analysis of such view patterns, we show that well-chosen base functions lead to suitable models, and show how the influence of day versus night on the total view patterns can be taken into account to further increase the accuracy, without leading to more complex models. Second, we turn to the prediction of future popularity, given recently published content. By means of a new real-world dataset, we show that the combination of features related to content, meta-data, and the temporal behavior leads to significantly improved predictions, compared to existing approaches which only consider features based on the historical popularity of the considered articles. Whereas traditionally linear regression is used for the application under study, we show that the more expressive gradient tree boosting method proves beneficial for predicting news popularity
- …