179 research outputs found
Doubly-Attentive Decoder for Multi-modal Neural Machine Translation
We introduce a Multi-modal Neural Machine Translation model in which a
doubly-attentive decoder naturally incorporates spatial visual features
obtained using pre-trained convolutional neural networks, bridging the gap
between image description and translation. Our decoder learns to attend to
source-language words and parts of an image independently by means of two
separate attention mechanisms as it generates words in the target language. We
find that our model can efficiently exploit not just back-translated in-domain
multi-modal data but also large general-domain text-only MT corpora. We also
report state-of-the-art results on the Multi30k data set.Comment: 8 pages (11 including references), 2 figure
Image Pivoting for Learning Multilingual Multimodal Representations
In this paper we propose a model to learn multimodal multilingual
representations for matching images and sentences in different languages, with
the aim of advancing multilingual versions of image search and image
understanding. Our model learns a common representation for images and their
descriptions in two different languages (which need not be parallel) by
considering the image as a pivot between two languages. We introduce a new
pairwise ranking loss function which can handle both symmetric and asymmetric
similarity between the two modalities. We evaluate our models on
image-description ranking for German and English, and on semantic textual
similarity of image descriptions in English. In both cases we achieve
state-of-the-art performance.Comment: 7 pages, EMNLP 201
- …