617 research outputs found
ADVISE: Symbolism and External Knowledge for Decoding Advertisements
In order to convey the most content in their limited space, advertisements
embed references to outside knowledge via symbolism. For example, a motorcycle
stands for adventure (a positive property the ad wants associated with the
product being sold), and a gun stands for danger (a negative property to
dissuade viewers from undesirable behaviors). We show how to use symbolic
references to better understand the meaning of an ad. We further show how
anchoring ad understanding in general-purpose object recognition and image
captioning improves results. We formulate the ad understanding task as matching
the ad image to human-generated statements that describe the action that the ad
prompts, and the rationale it provides for taking this action. Our proposed
method outperforms the state of the art on this task, and on an alternative
formulation of question-answering on ads. We show additional applications of
our learned representations for matching ads to slogans, and clustering ads
according to their topic, without extra training.Comment: To appear, Proceedings of the European Conference on Computer Vision
(ECCV
Unbiased Heterogeneous Scene Graph Generation with Relation-aware Message Passing Neural Network
Recent scene graph generation (SGG) frameworks have focused on learning
complex relationships among multiple objects in an image. Thanks to the nature
of the message passing neural network (MPNN) that models high-order
interactions between objects and their neighboring objects, they are dominant
representation learning modules for SGG. However, existing MPNN-based
frameworks assume the scene graph as a homogeneous graph, which restricts the
context-awareness of visual relations between objects. That is, they overlook
the fact that the relations tend to be highly dependent on the objects with
which the relations are associated. In this paper, we propose an unbiased
heterogeneous scene graph generation (HetSGG) framework that captures
relation-aware context using message passing neural networks. We devise a novel
message passing layer, called relation-aware message passing neural network
(RMP), that aggregates the contextual information of an image considering the
predicate type between objects. Our extensive evaluations demonstrate that
HetSGG outperforms state-of-the-art methods, especially outperforming on tail
predicate classes.Comment: 9 pages; AAAI 202
Visual Semantic Relatedness Dataset for Image Captioning
Modern image captioning system relies heavily on extracting knowledge from
images to capture the concept of a static story. In this paper, we propose a
textual visual context dataset for captioning, in which the publicly available
dataset COCO Captions (Lin et al., 2014) has been extended with information
about the scene (such as objects in the image). Since this information has a
textual form, it can be used to leverage any NLP task, such as text similarity
or semantic relation methods, into captioning systems, either as an end-to-end
training strategy or a post-processing based approach.Comment: Project Page: bit.ly/3Zq6AT
Towards Adaptable and Interactive Image Captioning with Data Augmentation and Episodic Memory
Interactive machine learning (IML) is a beneficial learning paradigm in cases
of limited data availability, as human feedback is incrementally integrated
into the training process. In this paper, we present an IML pipeline for image
captioning which allows us to incrementally adapt a pre-trained image
captioning model to a new data distribution based on user input. In order to
incorporate user input into the model, we explore the use of a combination of
simple data augmentation methods to obtain larger data batches for each newly
annotated data instance and implement continual learning methods to prevent
catastrophic forgetting from repeated updates. For our experiments, we split a
domain-specific image captioning dataset, namely VizWiz, into non-overlapping
parts to simulate an incremental input flow for continually adapting the model
to new data. We find that, while data augmentation worsens results, even when
relatively small amounts of data are available, episodic memory is an effective
strategy to retain knowledge from previously seen clusters
- …