Search CORE

617 research outputs found

ADVISE: Symbolism and External Knowledge for Decoding Advertisements

Author: A Kembhavi
H Hotelling
H Xu
JH Leigh
LM Scott
NE Spears
R Krishna
SJ Levy
TY Lin
W Goo
W Liu
Publication venue
Publication date: 29/07/2018
Field of study

In order to convey the most content in their limited space, advertisements embed references to outside knowledge via symbolism. For example, a motorcycle stands for adventure (a positive property the ad wants associated with the product being sold), and a gun stands for danger (a negative property to dissuade viewers from undesirable behaviors). We show how to use symbolic references to better understand the meaning of an ad. We further show how anchoring ad understanding in general-purpose object recognition and image captioning improves results. We formulate the ad understanding task as matching the ad image to human-generated statements that describe the action that the ad prompts, and the rationale it provides for taking this action. Our proposed method outperforms the state of the art on this task, and on an alternative formulation of question-answering on ads. We show additional applications of our learned representations for matching ads to slogans, and clustering ads according to their topic, without extra training.Comment: To appear, Proceedings of the European Conference on Computer Vision (ECCV

arXiv.org e-Print Archive

Crossref

Unbiased Heterogeneous Scene Graph Generation with Relation-aware Message Passing Neural Network

Author: Kim Kibum
Moon Jinyoung
Park Chanyoung
Yoon Kanghoon
Publication venue
Publication date: 13/06/2023
Field of study

Recent scene graph generation (SGG) frameworks have focused on learning complex relationships among multiple objects in an image. Thanks to the nature of the message passing neural network (MPNN) that models high-order interactions between objects and their neighboring objects, they are dominant representation learning modules for SGG. However, existing MPNN-based frameworks assume the scene graph as a homogeneous graph, which restricts the context-awareness of visual relations between objects. That is, they overlook the fact that the relations tend to be highly dependent on the objects with which the relations are associated. In this paper, we propose an unbiased heterogeneous scene graph generation (HetSGG) framework that captures relation-aware context using message passing neural networks. We devise a novel message passing layer, called relation-aware message passing neural network (RMP), that aggregates the contextual information of an image considering the predicate type between objects. Our extensive evaluations demonstrate that HetSGG outperforms state-of-the-art methods, especially outperforming on tail predicate classes.Comment: 9 pages; AAAI 202

arXiv.org e-Print Archive

Visual Semantic Relatedness Dataset for Image Captioning

Author: Moreno-Noguer Francesc
Padró Lluís
Sabir Ahmed
Publication venue
Publication date: 20/01/2023
Field of study

Modern image captioning system relies heavily on extracting knowledge from images to capture the concept of a static story. In this paper, we propose a textual visual context dataset for captioning, in which the publicly available dataset COCO Captions (Lin et al., 2014) has been extended with information about the scene (such as objects in the image). Since this information has a textual form, it can be used to leverage any NLP task, such as text similarity or semantic relation methods, into captioning systems, either as an end-to-end training strategy or a post-processing based approach.Comment: Project Page: bit.ly/3Zq6AT

arXiv.org e-Print Archive

Improving Image Captioning through Segmentation

Author: Pedro Daniel Fernandes Ferreira
Publication venue
Publication date: 25/07/2023
Field of study

Repositório Aberto da Universidade do Porto

Towards Adaptable and Interactive Image Captioning with Data Augmentation and Episodic Memory

Author: Anagnostopoulou Aliki
Hartmann Mareike
Sonntag Daniel
Publication venue
Publication date: 06/06/2023
Field of study

Interactive machine learning (IML) is a beneficial learning paradigm in cases of limited data availability, as human feedback is incrementally integrated into the training process. In this paper, we present an IML pipeline for image captioning which allows us to incrementally adapt a pre-trained image captioning model to a new data distribution based on user input. In order to incorporate user input into the model, we explore the use of a combination of simple data augmentation methods to obtain larger data batches for each newly annotated data instance and implement continual learning methods to prevent catastrophic forgetting from repeated updates. For our experiments, we split a domain-specific image captioning dataset, namely VizWiz, into non-overlapping parts to simulate an incremental input flow for continually adapting the model to new data. We find that, while data augmentation worsens results, even when relatively small amounts of data are available, episodic memory is an effective strategy to retain knowledge from previously seen clusters

arXiv.org e-Print Archive

A Study of Model of Kawaii Feelings for Evaluation of Products

Author: Tipporn Laohakangvalvit
Publication venue
Publication date: 23/07/2019
Field of study

芝浦工業大学2018年

Shibaura Institute of Technology Repository / 芝浦工業大学学術リポジトリ