12,470 research outputs found
Context-aware Captions from Context-agnostic Supervision
We introduce an inference technique to produce discriminative context-aware
image captions (captions that describe differences between images or visual
concepts) using only generic context-agnostic training data (captions that
describe a concept or an image in isolation). For example, given images and
captions of "siamese cat" and "tiger cat", we generate language that describes
the "siamese cat" in a way that distinguishes it from "tiger cat". Our key
novelty is that we show how to do joint inference over a language model that is
context-agnostic and a listener which distinguishes closely-related concepts.
We first apply our technique to a justification task, namely to describe why an
image contains a particular fine-grained category as opposed to another
closely-related category of the CUB-200-2011 dataset. We then study
discriminative image captioning to generate language that uniquely refers to
one of two semantically-similar images in the COCO dataset. Evaluations with
discriminative ground truth for justification and human studies for
discriminative image captioning reveal that our approach outperforms baseline
generative and speaker-listener approaches for discrimination.Comment: Accepted to CVPR 2017 (Spotlight
- …