6,063 research outputs found
Predicting Information Diffusion on Twitter - Analysis of predictive features
Information propagation on online social network focuses much attention in various domains as varied as politics, fact checking, or marketing. Modeling information diffusion in such growing communication media is crucial in order both to understand information propagation and to better control it. Our research aims at predicting whether a post is going to be forwarded or not. Moreover, we aim at predicting how much it is going to be diffused. Our model is based on three types of features: user-based, time-based and content-based. Using three collections corresponding to a total of about 16 millions of tweets, we show that our model improves of about 5% F-measure compared to the state of the art, both when predicting if a tweet is going to be re-tweeted and when predicting how popular it will be. F-measure in our model is between 70% and 82%, depending on the collection. We also show that some features we introduced are very important to predict retweetability such as the numbers of followers and number of communities that a user belongs to. Our contribution in this paper is twofold: firstly we defined new features to represent tweets in order to predict their possible propagation; secondly we evaluate the model we built on top of both features from the literature and features we defined on three collections and show the usefulness of our features in the prediction
Predicting Information Diffusion on Twitter - Analysis of predictive features
Information propagation on online social network focuses much attention in various domains as varied as politics, fact checking, or marketing. Modeling information diffusion in such growing communication media is crucial in order both to understand information propagation and to better control it. Our research aims at predicting whether a post is going to be forwarded or not. Moreover, we aim at predicting how much it is going to be diffused. Our model is based on three types of features: user-based, time-based and content-based. Using three collections corresponding to a total of about 16 millions of tweets, we show that our model improves of about 5% F-measure compared to the state of the art, both when predicting if a tweet is going to be re-tweeted and when predicting how popular it will be. F-measure in our model is between 70% and 82%, depending on the collection. We also show that some features we introduced are very important to predict retweetability such as the numbers of followers and number of communities that a user belongs to. Our contribution in this paper is twofold: firstly we defined new features to represent tweets in order to predict their possible propagation; secondly we evaluate the model we built on top of both features from the literature and features we defined on three collections and show the usefulness of our features in the prediction
Scalable Privacy-Compliant Virality Prediction on Twitter
The digital town hall of Twitter becomes a preferred medium of communication
for individuals and organizations across the globe. Some of them reach
audiences of millions, while others struggle to get noticed. Given the impact
of social media, the question remains more relevant than ever: how to model the
dynamics of attention in Twitter. Researchers around the world turn to machine
learning to predict the most influential tweets and authors, navigating the
volume, velocity, and variety of social big data, with many compromises. In
this paper, we revisit content popularity prediction on Twitter. We argue that
strict alignment of data acquisition, storage and analysis algorithms is
necessary to avoid the common trade-offs between scalability, accuracy and
privacy compliance. We propose a new framework for the rapid acquisition of
large-scale datasets, high accuracy supervisory signal and multilanguage
sentiment prediction while respecting every privacy request applicable. We then
apply a novel gradient boosting framework to achieve state-of-the-art results
in virality ranking, already before including tweet's visual or propagation
features. Our Gradient Boosted Regression Tree is the first to offer
explainable, strong ranking performance on benchmark datasets. Since the
analysis focused on features available early, the model is immediately
applicable to incoming tweets in 18 languages.Comment: AffCon@AAAI-19 Best Paper Award; Presented at AAAI-19 W1: Affective
Content Analysi
Predicting Successful Memes using Network and Community Structure
We investigate the predictability of successful memes using their early
spreading patterns in the underlying social networks. We propose and analyze a
comprehensive set of features and develop an accurate model to predict future
popularity of a meme given its early spreading patterns. Our paper provides the
first comprehensive comparison of existing predictive frameworks. We categorize
our features into three groups: influence of early adopters, community
concentration, and characteristics of adoption time series. We find that
features based on community structure are the most powerful predictors of
future success. We also find that early popularity of a meme is not a good
predictor of its future popularity, contrary to common belief. Our methods
outperform other approaches, particularly in the task of detecting very popular
or unpopular memes.Comment: 10 pages, 6 figures, 2 tables. Proceedings of 8th AAAI Intl. Conf. on
Weblogs and social media (ICWSM 2014
Can Cascades be Predicted?
On many social networking web sites such as Facebook and Twitter, resharing
or reposting functionality allows users to share others' content with their own
friends or followers. As content is reshared from user to user, large cascades
of reshares can form. While a growing body of research has focused on analyzing
and characterizing such cascades, a recent, parallel line of work has argued
that the future trajectory of a cascade may be inherently unpredictable. In
this work, we develop a framework for addressing cascade prediction problems.
On a large sample of photo reshare cascades on Facebook, we find strong
performance in predicting whether a cascade will continue to grow in the
future. We find that the relative growth of a cascade becomes more predictable
as we observe more of its reshares, that temporal and structural features are
key predictors of cascade size, and that initially, breadth, rather than depth
in a cascade is a better indicator of larger cascades. This prediction
performance is robust in the sense that multiple distinct classes of features
all achieve similar performance. We also discover that temporal features are
predictive of a cascade's eventual shape. Observing independent cascades of the
same content, we find that while these cascades differ greatly in size, we are
still able to predict which ends up the largest
Analyzing the Language of Food on Social Media
We investigate the predictive power behind the language of food on social
media. We collect a corpus of over three million food-related posts from
Twitter and demonstrate that many latent population characteristics can be
directly predicted from this data: overweight rate, diabetes rate, political
leaning, and home geographical location of authors. For all tasks, our
language-based models significantly outperform the majority-class baselines.
Performance is further improved with more complex natural language processing,
such as topic modeling. We analyze which textual features have most predictive
power for these datasets, providing insight into the connections between the
language of food, geographic locale, and community characteristics. Lastly, we
design and implement an online system for real-time query and visualization of
the dataset. Visualization tools, such as geo-referenced heatmaps,
semantics-preserving wordclouds and temporal histograms, allow us to discover
more complex, global patterns mirrored in the language of food.Comment: An extended abstract of this paper will appear in IEEE Big Data 201
- …