4 research outputs found
Scalable Privacy-Compliant Virality Prediction on Twitter
The digital town hall of Twitter becomes a preferred medium of communication
for individuals and organizations across the globe. Some of them reach
audiences of millions, while others struggle to get noticed. Given the impact
of social media, the question remains more relevant than ever: how to model the
dynamics of attention in Twitter. Researchers around the world turn to machine
learning to predict the most influential tweets and authors, navigating the
volume, velocity, and variety of social big data, with many compromises. In
this paper, we revisit content popularity prediction on Twitter. We argue that
strict alignment of data acquisition, storage and analysis algorithms is
necessary to avoid the common trade-offs between scalability, accuracy and
privacy compliance. We propose a new framework for the rapid acquisition of
large-scale datasets, high accuracy supervisory signal and multilanguage
sentiment prediction while respecting every privacy request applicable. We then
apply a novel gradient boosting framework to achieve state-of-the-art results
in virality ranking, already before including tweet's visual or propagation
features. Our Gradient Boosted Regression Tree is the first to offer
explainable, strong ranking performance on benchmark datasets. Since the
analysis focused on features available early, the model is immediately
applicable to incoming tweets in 18 languages.Comment: AffCon@AAAI-19 Best Paper Award; Presented at AAAI-19 W1: Affective
Content Analysi
CASPR: Customer Activity Sequence-based Prediction and Representation
Tasks critical to enterprise profitability, such as customer churn
prediction, fraudulent account detection or customer lifetime value estimation,
are often tackled by models trained on features engineered from customer data
in tabular format. Application-specific feature engineering adds development,
operationalization and maintenance costs over time. Recent advances in
representation learning present an opportunity to simplify and generalize
feature engineering across applications. When applying these advancements to
tabular data researchers deal with data heterogeneity, variations in customer
engagement history or the sheer volume of enterprise datasets. In this paper,
we propose a novel approach to encode tabular data containing customer
transactions, purchase history and other interactions into a generic
representation of a customer's association with the business. We then evaluate
these embeddings as features to train multiple models spanning a variety of
applications. CASPR, Customer Activity Sequence-based Prediction and
Representation, applies Transformer architecture to encode activity sequences
to improve model performance and avoid bespoke feature engineering across
applications. Our experiments at scale validate CASPR for both small and large
enterprise applications.Comment: Presented at the Table Representation Learning Workshop, NeurIPS
2022, New Orleans. Authors listed in random orde
On the Limits to Multi-Modal Popularity Prediction on Instagram -- A New Robust, Efficient and Explainable Baseline
Our global population contributes visual content on platforms like Instagram,
attempting to express themselves and engage their audiences, at an
unprecedented and increasing rate. In this paper, we revisit the popularity
prediction on Instagram. We present a robust, efficient, and explainable
baseline for population-based popularity prediction, achieving strong ranking
performance. We employ the latest methods in computer vision to maximize the
information extracted from the visual modality. We use transfer learning to
extract visual semantics such as concepts, scenes, and objects, allowing a new
level of scrutiny in an extensive, explainable ablation study. We inform
feature selection towards a robust and scalable model, but also illustrate
feature interactions, offering new directions for further inquiry in
computational social science. Our strongest models inform a lower limit to
population-based predictability of popularity on Instagram. The models are
immediately applicable to social media monitoring and influencer
identification.Comment: Presented at ICAART 202