33,980 research outputs found
Scalable Privacy-Compliant Virality Prediction on Twitter
The digital town hall of Twitter becomes a preferred medium of communication
for individuals and organizations across the globe. Some of them reach
audiences of millions, while others struggle to get noticed. Given the impact
of social media, the question remains more relevant than ever: how to model the
dynamics of attention in Twitter. Researchers around the world turn to machine
learning to predict the most influential tweets and authors, navigating the
volume, velocity, and variety of social big data, with many compromises. In
this paper, we revisit content popularity prediction on Twitter. We argue that
strict alignment of data acquisition, storage and analysis algorithms is
necessary to avoid the common trade-offs between scalability, accuracy and
privacy compliance. We propose a new framework for the rapid acquisition of
large-scale datasets, high accuracy supervisory signal and multilanguage
sentiment prediction while respecting every privacy request applicable. We then
apply a novel gradient boosting framework to achieve state-of-the-art results
in virality ranking, already before including tweet's visual or propagation
features. Our Gradient Boosted Regression Tree is the first to offer
explainable, strong ranking performance on benchmark datasets. Since the
analysis focused on features available early, the model is immediately
applicable to incoming tweets in 18 languages.Comment: AffCon@AAAI-19 Best Paper Award; Presented at AAAI-19 W1: Affective
Content Analysi
Understanding Image Virality
Virality of online content on social networking websites is an important but
esoteric phenomenon often studied in fields like marketing, psychology and data
mining. In this paper we study viral images from a computer vision perspective.
We introduce three new image datasets from Reddit, and define a virality score
using Reddit metadata. We train classifiers with state-of-the-art image
features to predict virality of individual images, relative virality in pairs
of images, and the dominant topic of a viral image. We also compare machine
performance to human performance on these tasks. We find that computers perform
poorly with low level features, and high level information is critical for
predicting virality. We encode semantic information through relative
attributes. We identify the 5 key visual attributes that correlate with
virality. We create an attribute-based characterization of images that can
predict relative virality with 68.10% accuracy (SVM+Deep Relative Attributes)
-- better than humans at 60.12%. Finally, we study how human prediction of
image virality varies with different `contexts' in which the images are viewed,
such as the influence of neighbouring images, images recently viewed, as well
as the image title or caption. This work is a first step in understanding the
complex but important phenomenon of image virality. Our datasets and
annotations will be made publicly available.Comment: Pre-print, IEEE Conference on Computer Vision and Pattern Recognition
(CVPR), 201
Shallow reading with Deep Learning: Predicting popularity of online content using only its title
With the ever decreasing attention span of contemporary Internet users, the
title of online content (such as a news article or video) can be a major factor
in determining its popularity. To take advantage of this phenomenon, we propose
a new method based on a bidirectional Long Short-Term Memory (LSTM) neural
network designed to predict the popularity of online content using only its
title. We evaluate the proposed architecture on two distinct datasets of news
articles and news videos distributed in social media that contain over 40,000
samples in total. On those datasets, our approach improves the performance over
traditional shallow approaches by a margin of 15%. Additionally, we show that
using pre-trained word vectors in the embedding layer improves the results of
LSTM models, especially when the training set is small. To our knowledge, this
is the first attempt of applying popularity prediction using only textual
information from the title
- …