57,379 research outputs found
Scalable Privacy-Compliant Virality Prediction on Twitter
The digital town hall of Twitter becomes a preferred medium of communication
for individuals and organizations across the globe. Some of them reach
audiences of millions, while others struggle to get noticed. Given the impact
of social media, the question remains more relevant than ever: how to model the
dynamics of attention in Twitter. Researchers around the world turn to machine
learning to predict the most influential tweets and authors, navigating the
volume, velocity, and variety of social big data, with many compromises. In
this paper, we revisit content popularity prediction on Twitter. We argue that
strict alignment of data acquisition, storage and analysis algorithms is
necessary to avoid the common trade-offs between scalability, accuracy and
privacy compliance. We propose a new framework for the rapid acquisition of
large-scale datasets, high accuracy supervisory signal and multilanguage
sentiment prediction while respecting every privacy request applicable. We then
apply a novel gradient boosting framework to achieve state-of-the-art results
in virality ranking, already before including tweet's visual or propagation
features. Our Gradient Boosted Regression Tree is the first to offer
explainable, strong ranking performance on benchmark datasets. Since the
analysis focused on features available early, the model is immediately
applicable to incoming tweets in 18 languages.Comment: AffCon@AAAI-19 Best Paper Award; Presented at AAAI-19 W1: Affective
Content Analysi
Semantic Sentiment Analysis of Twitter Data
Internet and the proliferation of smart mobile devices have changed the way
information is created, shared, and spreads, e.g., microblogs such as Twitter,
weblogs such as LiveJournal, social networks such as Facebook, and instant
messengers such as Skype and WhatsApp are now commonly used to share thoughts
and opinions about anything in the surrounding world. This has resulted in the
proliferation of social media content, thus creating new opportunities to study
public opinion at a scale that was never possible before. Naturally, this
abundance of data has quickly attracted business and research interest from
various fields including marketing, political science, and social studies,
among many others, which are interested in questions like these: Do people like
the new Apple Watch? Do Americans support ObamaCare? How do Scottish feel about
the Brexit? Answering these questions requires studying the sentiment of
opinions people express in social media, which has given rise to the fast
growth of the field of sentiment analysis in social media, with Twitter being
especially popular for research due to its scale, representativeness, variety
of topics discussed, as well as ease of public access to its messages. Here we
present an overview of work on sentiment analysis on Twitter.Comment: Microblog sentiment analysis; Twitter opinion mining; In the
Encyclopedia on Social Network Analysis and Mining (ESNAM), Second edition.
201
Sparse Allreduce: Efficient Scalable Communication for Power-Law Data
Many large datasets exhibit power-law statistics: The web graph, social
networks, text data, click through data etc. Their adjacency graphs are termed
natural graphs, and are known to be difficult to partition. As a consequence
most distributed algorithms on these graphs are communication intensive. Many
algorithms on natural graphs involve an Allreduce: a sum or average of
partitioned data which is then shared back to the cluster nodes. Examples
include PageRank, spectral partitioning, and many machine learning algorithms
including regression, factor (topic) models, and clustering. In this paper we
describe an efficient and scalable Allreduce primitive for power-law data. We
point out scaling problems with existing butterfly and round-robin networks for
Sparse Allreduce, and show that a hybrid approach improves on both.
Furthermore, we show that Sparse Allreduce stages should be nested instead of
cascaded (as in the dense case). And that the optimum throughput Allreduce
network should be a butterfly of heterogeneous degree where degree decreases
with depth into the network. Finally, a simple replication scheme is introduced
to deal with node failures. We present experiments showing significant
improvements over existing systems such as PowerGraph and Hadoop
- …