38 research outputs found
Understanding Chat Messages for Sticker Recommendation in Messaging Apps
Stickers are popularly used in messaging apps such as Hike to visually
express a nuanced range of thoughts and utterances to convey exaggerated
emotions. However, discovering the right sticker from a large and ever
expanding pool of stickers while chatting can be cumbersome. In this paper, we
describe a system for recommending stickers in real time as the user is typing
based on the context of the conversation. We decompose the sticker
recommendation (SR) problem into two steps. First, we predict the message that
the user is likely to send in the chat. Second, we substitute the predicted
message with an appropriate sticker. Majority of Hike's messages are in the
form of text which is transliterated from users' native language to the Roman
script. This leads to numerous orthographic variations of the same message and
makes accurate message prediction challenging. To address this issue, we learn
dense representations of chat messages employing character level convolution
network in an unsupervised manner. We use them to cluster the messages that
have the same meaning. In the subsequent steps, we predict the message cluster
instead of the message. Our approach does not depend on human labelled data
(except for validation), leading to fully automatic updation and tuning
pipeline for the underlying models. We also propose a novel hybrid message
prediction model, which can run with low latency on low-end phones that have
severe computational limitations. Our described system has been deployed for
more than months and is being used by millions of users along with hundreds
of thousands of expressive stickers
SuperTweetEval: A Challenging, Unified and Heterogeneous Benchmark for Social Media NLP Research
Despite its relevance, the maturity of NLP for social media pales in
comparison with general-purpose models, metrics and benchmarks. This fragmented
landscape makes it hard for the community to know, for instance, given a task,
which is the best performing model and how it compares with others. To
alleviate this issue, we introduce a unified benchmark for NLP evaluation in
social media, SuperTweetEval, which includes a heterogeneous set of tasks and
datasets combined, adapted and constructed from scratch. We benchmarked the
performance of a wide range of models on SuperTweetEval and our results suggest
that, despite the recent advances in language modelling, social media remains
challenging.Comment: EMNLP 2023 Finding