Search CORE

4,300 research outputs found

Latent Variable Model for Multi-modal Translation

Author: Aziz Wilker
Calixto Iacer
Rios Miguel
Publication venue
Publication date: 01/01/2019
Field of study

In this work, we propose to model the interaction between visual and textual features for multi-modal neural machine translation (MMT) through a latent variable model. This latent variable can be seen as a multi-modal stochastic embedding of an image and its description in a foreign language. It is used in a target-language decoder and also to predict image features. Importantly, our model formulation utilises visual and textual inputs during training but does not require that images be available at test time. We show that our latent variable MMT formulation improves considerably over strong baselines, including a multi-task learning approach (Elliott and K\'ad\'ar, 2017) and a conditional variational auto-encoder approach (Toyama et al., 2016). Finally, we show improvements due to (i) predicting image features in addition to only conditioning on them, (ii) imposing a constraint on the minimum amount of information encoded in the latent variable, and (iii) by training on additional target-language image descriptions (i.e. synthetic data).Comment: Paper accepted at ACL 2019. Contains 8 pages (11 including references, 13 including appendix), 6 figure

arXiv.org e-Print Archive

Crossref

International Migration, Integration and Social Cohesion online publications

UvA-DARE

Lessons learned in multilingual grounded language learning

Author: Alishahi Afra
Chrupała Grzegorz
Côté Marc-Alexandre
Elliott Desmond
Kádár Ákos
Publication venue
Publication date: 01/01/2018
Field of study

Recent work has shown how to learn better visual-semantic embeddings by leveraging image descriptions in more than one language. Here, we investigate in detail which conditions affect the performance of this type of grounded language learning model. We show that multilingual training improves over bilingual training, and that low-resource languages benefit from training with higher-resource languages. We demonstrate that a multilingual model can be trained equally well on either translations or comparable sentence pairs, and that annotating the same set of images in multiple language enables further improvements via an additional caption-caption ranking objective.Comment: CoNLL 201

arXiv.org e-Print Archive

Crossref

Copenhagen University Research Information System

Tilburg University Repository

LIUM-CVC Submissions for WMT17 Multimodal Translation Task

Author: Aransa Walid
Bardet Adrien
Barrault Loïc
Bougares Fethi
Caglayan Ozan
García-Martínez Mercedes
Herranz Luis
Masana Marc
van de Weijer Joost
Publication venue
Publication date: 01/01/2017
Field of study

This paper describes the monomodal and multimodal Neural Machine Translation systems developed by LIUM and CVC for WMT17 Shared Task on Multimodal Translation. We mainly explored two multimodal architectures where either global visual features or convolutional feature maps are integrated in order to benefit from visual context. Our final systems ranked first for both En-De and En-Fr language pairs according to the automatic evaluation metrics METEOR and BLEU.Comment: MMT System Description Paper for WMT1

arXiv.org e-Print Archive

Crossref

Spiral - Imperial College Digital Repository