158 research outputs found

    SEISMIC: A Self-Exciting Point Process Model for Predicting Tweet Popularity

    Full text link
    Social networking websites allow users to create and share content. Big information cascades of post resharing can form as users of these sites reshare others' posts with their friends and followers. One of the central challenges in understanding such cascading behaviors is in forecasting information outbreaks, where a single post becomes widely popular by being reshared by many users. In this paper, we focus on predicting the final number of reshares of a given post. We build on the theory of self-exciting point processes to develop a statistical model that allows us to make accurate predictions. Our model requires no training or expensive feature engineering. It results in a simple and efficiently computable formula that allows us to answer questions, in real-time, such as: Given a post's resharing history so far, what is our current estimate of its final number of reshares? Is the post resharing cascade past the initial stage of explosive growth? And, which posts will be the most reshared in the future? We validate our model using one month of complete Twitter data and demonstrate a strong improvement in predictive accuracy over existing approaches. Our model gives only 15% relative error in predicting final size of an average information cascade after observing it for just one hour.Comment: 10 pages, published in KDD 201

    A Bayesian Point Process Model for User Return Time Prediction in Recommendation Systems

    Get PDF
    In order to sustain the user-base for a web service, it is important to know the return time of a user to the service. We propose a Bayesian point process, log Gaussian Cox process (LGCP), to model and predict return time of users. It allows encoding the prior do- main knowledge and non-parametric estimation of latent intensity functions capturing user behaviour. We capture the similarities among the users in their return time by using a multi-task learning approach. We show the effectiveness of the proposed approaches on predicting the return time of users to last.fm music service

    Probabilistic Modeling of Rumour Stance and Popularity in Social Media

    Get PDF
    Social media tends to be rife with rumours when new reports are released piecemeal during breaking news events. One can mine multiple reactions expressed by social media users in those situations, exploring users’ stance towards rumours, ultimately enabling the flagging of highly disputed rumours as being potentially false. Moreover, rumours in social media exhibit complex temporal patterns. Some rumours are discussed with an increasing number of tweets per unit of time whereas other rumours fail to gain ground. This thesis develops probabilistic models of rumours in social media driven by two applications: rumour stance classification and modeling temporal dynamics of rumours. Rumour stance classification is the task of classifying the stance expressed in an individual tweet towards a rumour. Modeling temporal dynamics of rumours is an application where rumour prevalence is modeled over time. Both applications provide insights into how a rumour attracts attention from the social media community. These can assist journalists with their work on rumour tracking and debunking, and can be used in downstream applications such as systems for rumour veracity classification. In this thesis, we develop models based on probabilistic approaches. We motivate Gaussian processes and point processes as appropriate tools and show how features not considered in previous work can be included. We show that for both applications, transfer learning approaches are successful, supporting the hypothesis that there is a common underlying signal across different rumours. We furthermore introduce novel machine learning techniques which have the potential to be used in other applications: convolution kernels for streams of text over continuous time and a sequence classification algorithm based on point processes

    Fast and scalable non-parametric Bayesian inference for Poisson point processes

    Get PDF
    We study the problem of non-parametric Bayesian estimation of the intensity function of a Poisson point process. The observations are nn independent realisations of a Poisson point process on the interval [0,T][0,T]. We propose two related approaches. In both approaches we model the intensity function as piecewise constant on NN bins forming a partition of the interval [0,T][0,T]. In the first approach the coefficients of the intensity function are assigned independent gamma priors, leading to a closed form posterior distribution. On the theoretical side, we prove that as n,n\rightarrow\infty, the posterior asymptotically concentrates around the "true", data-generating intensity function at an optimal rate for hh-H\"older regular intensity functions (0<h10 < h\leq 1). In the second approach we employ a gamma Markov chain prior on the coefficients of the intensity function. The posterior distribution is no longer available in closed form, but inference can be performed using a straightforward version of the Gibbs sampler. Both approaches scale well with sample size, but the second is much less sensitive to the choice of NN. Practical performance of our methods is first demonstrated via synthetic data examples. We compare our second method with other existing approaches on the UK coal mining disasters data. Furthermore, we apply it to the US mass shootings data and Donald Trump's Twitter data.Comment: 45 pages, 22 figure

    Longitudinal Modeling of Social Media with Hawkes Process based on Users and Networks

    Get PDF
    Online social networks provide a platform for sharing information at an unprecedented scale. Users generate information which propagates across the network resulting in information cascades. In this paper, we study the evolution of information cascades in Twitter using a point process model of user activity. We develop several Hawkes process models considering various properties including conversational structure, users’ connections and general features of users including the textual information, and show how they are helpful in modeling the social network activity. We consider low-rank embeddings of users and user features, and learn the features helpful in identifying the influence and susceptibility of users. Evaluation on Twitter data sets associated with civil unrest shows that incorporating richer properties improves the performance in predicting future activity of users and memes

    Modeling user return time using inhomogeneous poisson process

    No full text
    For Intelligent Assistants (IA), user activity is often used as a lag metric for user satisfaction or engagement. Conversely, predictive leading metrics for engagement can be helpful with decision making and evaluating changes in satisfaction caused by new features. In this paper, we propose User Return Time (URT), a fine grain metric for gauging user engagement. To compute URT, we model continuous inter-arrival times between users’ use of service via a log Gaussian Cox process (LGCP), a form of inhomogeneous Poisson process which captures the irregular variations in user usage rate and personal preferences typical of an IA. We show the effectiveness of the proposed approaches on predicting the return time of users on real-world data collected from an IA. Experimental results demonstrate that our model is able to predict user return times reasonably well and considerably better than strong baselines that make the prediction based on past utterance frequency
    corecore