39 research outputs found
Learning to Parse and Translate Improves Neural Machine Translation
There has been relatively little attention to incorporating linguistic prior
to neural machine translation. Much of the previous work was further
constrained to considering linguistic prior on the source side. In this paper,
we propose a hybrid model, called NMT+RNNG, that learns to parse and translate
by combining the recurrent neural network grammar into the attention-based
neural machine translation. Our approach encourages the neural machine
translation model to incorporate linguistic prior during training, and lets it
translate on its own afterward. Extensive experiments with four language pairs
show the effectiveness of the proposed NMT+RNNG.Comment: Accepted as a short paper at the 55th Annual Meeting of the
Association for Computational Linguistics (ACL 2017
Towards Neural Machine Translation with Latent Tree Attention
Building models that take advantage of the hierarchical structure of language
without a priori annotation is a longstanding goal in natural language
processing. We introduce such a model for the task of machine translation,
pairing a recurrent neural network grammar encoder with a novel attentional
RNNG decoder and applying policy gradient reinforcement learning to induce
unsupervised tree structures on both the source and target. When trained on
character-level datasets with no explicit segmentation or parse annotation, the
model learns a plausible segmentation and shallow parse, obtaining performance
close to an attentional baseline.Comment: Presented at SPNLP 201
Multitask Learning with Low-Level Auxiliary Tasks for Encoder-Decoder Based Speech Recognition
End-to-end training of deep learning-based models allows for implicit
learning of intermediate representations based on the final task loss. However,
the end-to-end approach ignores the useful domain knowledge encoded in explicit
intermediate-level supervision. We hypothesize that using intermediate
representations as auxiliary supervision at lower levels of deep networks may
be a good way of combining the advantages of end-to-end training and more
traditional pipeline approaches. We present experiments on conversational
speech recognition where we use lower-level tasks, such as phoneme recognition,
in a multitask training approach with an encoder-decoder model for direct
character transcription. We compare multiple types of lower-level tasks and
analyze the effects of the auxiliary tasks. Our results on the Switchboard
corpus show that this approach improves recognition accuracy over a standard
encoder-decoder model on the Eval2000 test set
Syntax-Directed Attention for Neural Machine Translation
Attention mechanism, including global attention and local attention, plays a
key role in neural machine translation (NMT). Global attention attends to all
source words for word prediction. In comparison, local attention selectively
looks at fixed-window source words. However, alignment weights for the current
target word often decrease to the left and right by linear distance centering
on the aligned source position and neglect syntax-directed distance
constraints. In this paper, we extend local attention with syntax-distance
constraint, to focus on syntactically related source words with the predicted
target word, thus learning a more effective context vector for word prediction.
Moreover, we further propose a double context NMT architecture, which consists
of a global context vector and a syntax-directed context vector over the global
attention, to provide more translation performance for NMT from source
representation. The experiments on the large-scale Chinese-to-English and
English-to-Germen translation tasks show that the proposed approach achieves a
substantial and significant improvement over the baseline system.Comment: AAAI2018, revised versio