158 research outputs found
SEISMIC: A Self-Exciting Point Process Model for Predicting Tweet Popularity
Social networking websites allow users to create and share content. Big
information cascades of post resharing can form as users of these sites reshare
others' posts with their friends and followers. One of the central challenges
in understanding such cascading behaviors is in forecasting information
outbreaks, where a single post becomes widely popular by being reshared by many
users. In this paper, we focus on predicting the final number of reshares of a
given post. We build on the theory of self-exciting point processes to develop
a statistical model that allows us to make accurate predictions. Our model
requires no training or expensive feature engineering. It results in a simple
and efficiently computable formula that allows us to answer questions, in
real-time, such as: Given a post's resharing history so far, what is our
current estimate of its final number of reshares? Is the post resharing cascade
past the initial stage of explosive growth? And, which posts will be the most
reshared in the future? We validate our model using one month of complete
Twitter data and demonstrate a strong improvement in predictive accuracy over
existing approaches. Our model gives only 15% relative error in predicting
final size of an average information cascade after observing it for just one
hour.Comment: 10 pages, published in KDD 201
A Bayesian Point Process Model for User Return Time Prediction in Recommendation Systems
In order to sustain the user-base for a web service, it is important
to know the return time of a user to the service. We propose a
Bayesian point process, log Gaussian Cox process (LGCP), to model
and predict return time of users. It allows encoding the prior do-
main knowledge and non-parametric estimation of latent intensity
functions capturing user behaviour. We capture the similarities
among the users in their return time by using a multi-task learning
approach. We show the effectiveness of the proposed approaches
on predicting the return time of users to last.fm music service
Probabilistic Modeling of Rumour Stance and Popularity in Social Media
Social media tends to be rife with rumours when new reports are released piecemeal during breaking news events. One can mine multiple reactions expressed by social media users in those situations, exploring users’ stance towards rumours, ultimately enabling the flagging of highly disputed rumours as being potentially false. Moreover, rumours in social media exhibit complex temporal patterns. Some rumours are discussed with an increasing number of tweets per unit of time whereas other rumours fail to gain ground. This thesis develops probabilistic models of rumours in social media driven by two applications: rumour stance classification and modeling temporal dynamics of rumours. Rumour stance classification is the task of classifying the stance expressed in an individual tweet towards a rumour. Modeling temporal dynamics of rumours is an application where rumour prevalence is modeled over time. Both applications provide insights into how a rumour attracts attention from the social media community. These can assist journalists with their work on rumour tracking and debunking, and can be used in downstream applications such as systems for rumour veracity classification. In this thesis, we develop models based on probabilistic approaches. We motivate Gaussian processes and point processes as appropriate tools and show how features not considered in previous work can be included. We show that for both applications, transfer learning approaches are successful, supporting the hypothesis that there is a common underlying signal across different rumours. We furthermore introduce novel machine learning techniques which have the potential to be used in other applications: convolution kernels for streams of text over continuous time and a sequence classification algorithm based on point processes
Fast and scalable non-parametric Bayesian inference for Poisson point processes
We study the problem of non-parametric Bayesian estimation of the intensity
function of a Poisson point process. The observations are independent
realisations of a Poisson point process on the interval . We propose two
related approaches. In both approaches we model the intensity function as
piecewise constant on bins forming a partition of the interval . In
the first approach the coefficients of the intensity function are assigned
independent gamma priors, leading to a closed form posterior distribution. On
the theoretical side, we prove that as the posterior
asymptotically concentrates around the "true", data-generating intensity
function at an optimal rate for -H\"older regular intensity functions (). In the second approach we employ a gamma Markov chain prior on the
coefficients of the intensity function. The posterior distribution is no longer
available in closed form, but inference can be performed using a
straightforward version of the Gibbs sampler. Both approaches scale well with
sample size, but the second is much less sensitive to the choice of .
Practical performance of our methods is first demonstrated via synthetic data
examples. We compare our second method with other existing approaches on the UK
coal mining disasters data. Furthermore, we apply it to the US mass shootings
data and Donald Trump's Twitter data.Comment: 45 pages, 22 figure
Longitudinal Modeling of Social Media with Hawkes Process based on Users and Networks
Online social networks provide a platform for
sharing information at an unprecedented scale. Users generate
information which propagates across the network resulting in
information cascades. In this paper, we study the evolution of
information cascades in Twitter using a point process model
of user activity. We develop several Hawkes process models
considering various properties including conversational structure,
users’ connections and general features of users including the
textual information, and show how they are helpful in modeling
the social network activity. We consider low-rank embeddings
of users and user features, and learn the features helpful in
identifying the influence and susceptibility of users. Evaluation
on Twitter data sets associated with civil unrest shows that
incorporating richer properties improves the performance in
predicting future activity of users and memes
Modeling user return time using inhomogeneous poisson process
For Intelligent Assistants (IA), user activity is often used as
a lag metric for user satisfaction or engagement. Conversely, predictive
leading metrics for engagement can be helpful with decision making and
evaluating changes in satisfaction caused by new features. In this paper,
we propose User Return Time (URT), a fine grain metric for gauging user
engagement. To compute URT, we model continuous inter-arrival times
between users’ use of service via a log Gaussian Cox process (LGCP),
a form of inhomogeneous Poisson process which captures the irregular
variations in user usage rate and personal preferences typical of an IA.
We show the effectiveness of the proposed approaches on predicting the
return time of users on real-world data collected from an IA. Experimental results demonstrate that our model is able to predict user return
times reasonably well and considerably better than strong baselines that
make the prediction based on past utterance frequency
- …