Search CORE

179 research outputs found

Doubly-Attentive Decoder for Multi-modal Neural Machine Translation

Author: Calixto Iacer
Campbell Nick
Liu Qun
Publication venue
Publication date: 01/01/2017
Field of study

We introduce a Multi-modal Neural Machine Translation model in which a doubly-attentive decoder naturally incorporates spatial visual features obtained using pre-trained convolutional neural networks, bridging the gap between image description and translation. Our decoder learns to attend to source-language words and parts of an image independently by means of two separate attention mechanisms as it generates words in the target language. We find that our model can efficiently exploit not just back-translated in-domain multi-modal data but also large general-domain text-only MT corpora. We also report state-of-the-art results on the Multi30k data set.Comment: 8 pages (11 including references), 2 figure

arXiv.org e-Print Archive

Crossref

Learning Visually Grounded and Multilingual Representations

Author: Kádár Akos
Publication venue: [s.n.]
Publication date: 01/01/2019
Field of study

Tilburg University Repository

Image Pivoting for Learning Multilingual Multimodal Representations

Author: Gella Spandana
Keller Frank
Lapata Mirella
Sennrich Rico
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2017
Field of study

In this paper we propose a model to learn multimodal multilingual representations for matching images and sentences in different languages, with the aim of advancing multilingual versions of image search and image understanding. Our model learns a common representation for images and their descriptions in two different languages (which need not be parallel) by considering the image as a pivot between two languages. We introduce a new pairwise ranking loss function which can handle both symmetric and asymmetric similarity between the two modalities. We evaluate our models on image-description ranking for German and English, and on semantic textual similarity of image descriptions in English. In both cases we achieve state-of-the-art performance.Comment: 7 pages, EMNLP 201

arXiv.org e-Print Archive

Crossref

Edinburgh Research Explorer