34,035 research outputs found
Syntax-Directed Attention for Neural Machine Translation
Attention mechanism, including global attention and local attention, plays a
key role in neural machine translation (NMT). Global attention attends to all
source words for word prediction. In comparison, local attention selectively
looks at fixed-window source words. However, alignment weights for the current
target word often decrease to the left and right by linear distance centering
on the aligned source position and neglect syntax-directed distance
constraints. In this paper, we extend local attention with syntax-distance
constraint, to focus on syntactically related source words with the predicted
target word, thus learning a more effective context vector for word prediction.
Moreover, we further propose a double context NMT architecture, which consists
of a global context vector and a syntax-directed context vector over the global
attention, to provide more translation performance for NMT from source
representation. The experiments on the large-scale Chinese-to-English and
English-to-Germen translation tasks show that the proposed approach achieves a
substantial and significant improvement over the baseline system.Comment: AAAI2018, revised versio
Towards Bidirectional Hierarchical Representations for Attention-Based Neural Machine Translation
This paper proposes a hierarchical attentional neural translation model which
focuses on enhancing source-side hierarchical representations by covering both
local and global semantic information using a bidirectional tree-based encoder.
To maximize the predictive likelihood of target words, a weighted variant of an
attention mechanism is used to balance the attentive information between
lexical and phrase vectors. Using a tree-based rare word encoding, the proposed
model is extended to sub-word level to alleviate the out-of-vocabulary (OOV)
problem. Empirical results reveal that the proposed model significantly
outperforms sequence-to-sequence attention-based and tree-based neural
translation models in English-Chinese translation tasks.Comment: Accepted for publication at EMNLP 201
Neural Machine Translation with Word Predictions
In the encoder-decoder architecture for neural machine translation (NMT), the
hidden states of the recurrent structures in the encoder and decoder carry the
crucial information about the sentence.These vectors are generated by
parameters which are updated by back-propagation of translation errors through
time. We argue that propagating errors through the end-to-end recurrent
structures are not a direct way of control the hidden vectors. In this paper,
we propose to use word predictions as a mechanism for direct supervision. More
specifically, we require these vectors to be able to predict the vocabulary in
target sentence. Our simple mechanism ensures better representations in the
encoder and decoder without using any extra data or annotation. It is also
helpful in reducing the target side vocabulary and improving the decoding
efficiency. Experiments on Chinese-English and German-English machine
translation tasks show BLEU improvements by 4.53 and 1.3, respectivelyComment: Accepted at EMNLP201
- …