18 research outputs found
Understanding Chat Messages for Sticker Recommendation in Messaging Apps
Stickers are popularly used in messaging apps such as Hike to visually
express a nuanced range of thoughts and utterances to convey exaggerated
emotions. However, discovering the right sticker from a large and ever
expanding pool of stickers while chatting can be cumbersome. In this paper, we
describe a system for recommending stickers in real time as the user is typing
based on the context of the conversation. We decompose the sticker
recommendation (SR) problem into two steps. First, we predict the message that
the user is likely to send in the chat. Second, we substitute the predicted
message with an appropriate sticker. Majority of Hike's messages are in the
form of text which is transliterated from users' native language to the Roman
script. This leads to numerous orthographic variations of the same message and
makes accurate message prediction challenging. To address this issue, we learn
dense representations of chat messages employing character level convolution
network in an unsupervised manner. We use them to cluster the messages that
have the same meaning. In the subsequent steps, we predict the message cluster
instead of the message. Our approach does not depend on human labelled data
(except for validation), leading to fully automatic updation and tuning
pipeline for the underlying models. We also propose a novel hybrid message
prediction model, which can run with low latency on low-end phones that have
severe computational limitations. Our described system has been deployed for
more than months and is being used by millions of users along with hundreds
of thousands of expressive stickers
Improved Contextual Recognition In Automatic Speech Recognition Systems By Semantic Lattice Rescoring
Automatic Speech Recognition (ASR) has witnessed a profound research
interest. Recent breakthroughs have given ASR systems different prospects such
as faithfully transcribing spoken language, which is a pivotal advancement in
building conversational agents. However, there is still an imminent challenge
of accurately discerning context-dependent words and phrases. In this work, we
propose a novel approach for enhancing contextual recognition within ASR
systems via semantic lattice processing leveraging the power of deep learning
models in accurately delivering spot-on transcriptions across a wide variety of
vocabularies and speaking styles. Our solution consists of using Hidden Markov
Models and Gaussian Mixture Models (HMM-GMM) along with Deep Neural Networks
(DNN) models integrating both language and acoustic modeling for better
accuracy. We infused our network with the use of a transformer-based model to
properly rescore the word lattice achieving remarkable capabilities with a
palpable reduction in Word Error Rate (WER). We demonstrate the effectiveness
of our proposed framework on the LibriSpeech dataset with empirical analyses
CONFLATOR: Incorporating Switching Point based Rotatory Positional Encodings for Code-Mixed Language Modeling
The mixing of two or more languages is called Code-Mixing (CM). CM is a
social norm in multilingual societies. Neural Language Models (NLMs) like
transformers have been very effective on many NLP tasks. However, NLM for CM is
an under-explored area. Though transformers are capable and powerful, they
cannot always encode positional/sequential information since they are
non-recurrent. Therefore, to enrich word information and incorporate positional
information, positional encoding is defined. We hypothesize that Switching
Points (SPs), i.e., junctions in the text where the language switches (L1 -> L2
or L2-> L1), pose a challenge for CM Language Models (LMs), and hence give
special emphasis to switching points in the modeling process. We experiment
with several positional encoding mechanisms and show that rotatory positional
encodings along with switching point information yield the best results.
We introduce CONFLATOR: a neural language modeling approach for code-mixed
languages. CONFLATOR tries to learn to emphasize switching points using smarter
positional encoding, both at unigram and bigram levels. CONFLATOR outperforms
the state-of-the-art on two tasks based on code-mixed Hindi and English
(Hinglish): (i) sentiment analysis and (ii) machine translation
Findings of Factify 2: Multimodal Fake News Detection
With social media usage growing exponentially in the past few years, fake
news has also become extremely prevalent. The detrimental impact of fake news
emphasizes the need for research focused on automating the detection of false
information and verifying its accuracy. In this work, we present the outcome of
the Factify 2 shared task, which provides a multi-modal fact verification and
satire news dataset, as part of the DeFactify 2 workshop at AAAI'23. The data
calls for a comparison based approach to the task by pairing social media
claims with supporting documents, with both text and image, divided into 5
classes based on multi-modal relations. In the second iteration of this task we
had over 60 participants and 9 final test-set submissions. The best
performances came from the use of DeBERTa for text and Swinv2 and CLIP for
image. The highest F1 score averaged for all five classes was 81.82%.Comment: Defactify2 @AAAI 202
Overview of Memotion 3: Sentiment and Emotion Analysis of Codemixed Hinglish Memes
Analyzing memes on the internet has emerged as a crucial endeavor due to the
impact this multi-modal form of content wields in shaping online discourse.
Memes have become a powerful tool for expressing emotions and sentiments,
possibly even spreading hate and misinformation, through humor and sarcasm. In
this paper, we present the overview of the Memotion 3 shared task, as part of
the DeFactify 2 workshop at AAAI-23. The task released an annotated dataset of
Hindi-English code-mixed memes based on their Sentiment (Task A), Emotion (Task
B), and Emotion intensity (Task C). Each of these is defined as an individual
task and the participants are ranked separately for each task. Over 50 teams
registered for the shared task and 5 made final submissions to the test set of
the Memotion 3 dataset. CLIP, BERT modifications, ViT etc. were the most
popular models among the participants along with approaches such as
Student-Teacher model, Fusion, and Ensembling. The best final F1 score for Task
A is 34.41, Task B is 79.77 and Task C is 59.82.Comment: Defactify2 @AAAI 202
Factify 2: A Multimodal Fake News and Satire News Dataset
The internet gives the world an open platform to express their views and
share their stories. While this is very valuable, it makes fake news one of our
society's most pressing problems. Manual fact checking process is time
consuming, which makes it challenging to disprove misleading assertions before
they cause significant harm. This is he driving interest in automatic fact or
claim verification. Some of the existing datasets aim to support development of
automating fact-checking techniques, however, most of them are text based.
Multi-modal fact verification has received relatively scant attention. In this
paper, we provide a multi-modal fact-checking dataset called FACTIFY 2,
improving Factify 1 by using new data sources and adding satire articles.
Factify 2 has 50,000 new data instances. Similar to FACTIFY 1.0, we have three
broad categories - support, no-evidence, and refute, with sub-categories based
on the entailment of visual and textual data. We also provide a BERT and Vison
Transformer based baseline, which acheives 65% F1 score in the test set. The
baseline codes and the dataset will be made available at
https://github.com/surya1701/Factify-2.0.Comment: Defactify@AAAI202