15,708 research outputs found
A Stochastic Decoder for Neural Machine Translation
The process of translation is ambiguous, in that there are typically many
valid trans- lations for a given sentence. This gives rise to significant
variation in parallel cor- pora, however, most current models of machine
translation do not account for this variation, instead treating the prob- lem
as a deterministic process. To this end, we present a deep generative model of
machine translation which incorporates a chain of latent variables, in order to
ac- count for local lexical and syntactic varia- tion in parallel corpora. We
provide an in- depth analysis of the pitfalls encountered in variational
inference for training deep generative models. Experiments on sev- eral
different language pairs demonstrate that the model consistently improves over
strong baselines.Comment: Accepted at ACL 201
Towards Neural Machine Translation with Latent Tree Attention
Building models that take advantage of the hierarchical structure of language
without a priori annotation is a longstanding goal in natural language
processing. We introduce such a model for the task of machine translation,
pairing a recurrent neural network grammar encoder with a novel attentional
RNNG decoder and applying policy gradient reinforcement learning to induce
unsupervised tree structures on both the source and target. When trained on
character-level datasets with no explicit segmentation or parse annotation, the
model learns a plausible segmentation and shallow parse, obtaining performance
close to an attentional baseline.Comment: Presented at SPNLP 201
Latent Variable Model for Multi-modal Translation
In this work, we propose to model the interaction between visual and textual
features for multi-modal neural machine translation (MMT) through a latent
variable model. This latent variable can be seen as a multi-modal stochastic
embedding of an image and its description in a foreign language. It is used in
a target-language decoder and also to predict image features. Importantly, our
model formulation utilises visual and textual inputs during training but does
not require that images be available at test time. We show that our latent
variable MMT formulation improves considerably over strong baselines, including
a multi-task learning approach (Elliott and K\'ad\'ar, 2017) and a conditional
variational auto-encoder approach (Toyama et al., 2016). Finally, we show
improvements due to (i) predicting image features in addition to only
conditioning on them, (ii) imposing a constraint on the minimum amount of
information encoded in the latent variable, and (iii) by training on additional
target-language image descriptions (i.e. synthetic data).Comment: Paper accepted at ACL 2019. Contains 8 pages (11 including
references, 13 including appendix), 6 figure
- …