2 research outputs found
Memeify: A Large-Scale Meme Generation System
Interest in the research areas related to meme propagation and generation has
been increasing rapidly in the last couple of years. Meme datasets available
online are either specific to a context or contain no class information. Here,
we prepare a large-scale dataset of memes with captions and class labels. The
dataset consists of 1.1 million meme captions from 128 classes. We also provide
reasoning for the existence of broad categories, called "themes" across the
meme dataset; each theme consists of multiple meme classes. Our generation
system uses a trained state-of-the-art transformer-based model for caption
generation by employing an encoder-decoder architecture. We develop a web
interface, called Memeify for users to generate memes of their choice, and
explain in detail, the working of individual components of the system. We also
perform a qualitative evaluation of the generated memes by conducting a user
study. A link to the demonstration of the Memeify system is
https://youtu.be/P_Tfs0X-czs.Comment: Accepted at ACM India Joint International Conference on Data Science
& Management of Data (CoDS-CoMAD) 202
COBRA: Contrastive Bi-Modal Representation Algorithm
There are a wide range of applications that involve multi-modal data, such as
cross-modal retrieval, visual question-answering, and image captioning. Such
applications are primarily dependent on aligned distributions of the different
constituent modalities. Existing approaches generate latent embeddings for each
modality in a joint fashion by representing them in a common manifold. However
these joint embedding spaces fail to sufficiently reduce the modality gap,
which affects the performance in downstream tasks. We hypothesize that these
embeddings retain the intra-class relationships but are unable to preserve the
inter-class dynamics. In this paper, we present a novel framework COBRA that
aims to train two modalities (image and text) in a joint fashion inspired by
the Contrastive Predictive Coding (CPC) and Noise Contrastive Estimation (NCE)
paradigms which preserve both inter and intra-class relationships. We
empirically show that this framework reduces the modality gap significantly and
generates a robust and task agnostic joint-embedding space. We outperform
existing work on four diverse downstream tasks spanning across seven benchmark
cross-modal datasets.Comment: 13 Pages, 6 Figures and 10 Table