Compositional Generalization in Image Captioning

Abdou, Mostafa; Aralikatte, Rahul; Elliott, Desmond; Lamm, Matthew; Nikolaus, Mitja

Compositional Generalization in Image Captioning

Authors: Mostafa Abdou
Rahul Aralikatte
Desmond Elliott
Matthew Lamm
Mitja Nikolaus
Publication date: 1 January 2019
Publisher: 'Association for Computational Linguistics (ACL)'
Doi

Abstract

Image captioning models are usually evaluated on their ability to describe a held-out set of images, not on their ability to generalize to unseen concepts. We study the problem of compositional generalization, which measures how well a model composes unseen combinations of concepts when describing images. State-of-the-art image captioning models show poor generalization performance on this task. We propose a multi-task model to address the poor performance, that combines caption generation and image--sentence ranking, and uses a decoding mechanism that re-ranks the captions according their similarity to the image. This model is substantially better at generalizing to unseen combinations of concepts compared to state-of-the-art captioning models.Comment: To appear at CoNLL 2019, EMNL

Similar works

Full text

Open in the Core reader

Download PDF

Available Versions

Crossref

Last time updated on 10/08/2021

Copenhagen University Research Information System

oai:pure.atira.dk:publications...

Last time updated on 30/03/2020