Search CORE

34,035 research outputs found

Syntax-Directed Attention for Neural Machine Translation

Author: Chen Kehai
Sumita Eiichiro
Utiyama Masao
Wang Rui
Zhao Tiejun
Publication venue
Publication date: 26/04/2018
Field of study

Attention mechanism, including global attention and local attention, plays a key role in neural machine translation (NMT). Global attention attends to all source words for word prediction. In comparison, local attention selectively looks at fixed-window source words. However, alignment weights for the current target word often decrease to the left and right by linear distance centering on the aligned source position and neglect syntax-directed distance constraints. In this paper, we extend local attention with syntax-distance constraint, to focus on syntactically related source words with the predicted target word, thus learning a more effective context vector for word prediction. Moreover, we further propose a double context NMT architecture, which consists of a global context vector and a syntax-directed context vector over the global attention, to provide more translation performance for NMT from source representation. The experiments on the large-scale Chinese-to-English and English-to-Germen translation tasks show that the proposed approach achieves a substantial and significant improvement over the baseline system.Comment: AAAI2018, revised versio

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Towards Bidirectional Hierarchical Representations for Attention-Based Neural Machine Translation

Author: Chao Lidia S.
Wong Derek F.
Xiao Tong
Yang Baosong
Zhu Jingbo
Publication venue
Publication date: 01/01/2017
Field of study

This paper proposes a hierarchical attentional neural translation model which focuses on enhancing source-side hierarchical representations by covering both local and global semantic information using a bidirectional tree-based encoder. To maximize the predictive likelihood of target words, a weighted variant of an attention mechanism is used to balance the attentive information between lexical and phrase vectors. Using a tree-based rare word encoding, the proposed model is extended to sub-word level to alleviate the out-of-vocabulary (OOV) problem. Empirical results reveal that the proposed model significantly outperforms sequence-to-sequence attention-based and tree-based neural translation models in English-Chinese translation tasks.Comment: Accepted for publication at EMNLP 201

arXiv.org e-Print Archive

Crossref

Neural Machine Translation with Word Predictions

Author: Chen Jiajun
Dai Xinyu
Huang Shujian
Weng Rongxiang
Zheng Zaixiang
Publication venue
Publication date: 01/01/2017
Field of study

In the encoder-decoder architecture for neural machine translation (NMT), the hidden states of the recurrent structures in the encoder and decoder carry the crucial information about the sentence.These vectors are generated by parameters which are updated by back-propagation of translation errors through time. We argue that propagating errors through the end-to-end recurrent structures are not a direct way of control the hidden vectors. In this paper, we propose to use word predictions as a mechanism for direct supervision. More specifically, we require these vectors to be able to predict the vocabulary in target sentence. Our simple mechanism ensures better representations in the encoder and decoder without using any extra data or annotation. It is also helpful in reducing the target side vocabulary and improving the decoding efficiency. Experiments on Chinese-English and German-English machine translation tasks show BLEU improvements by 4.53 and 1.3, respectivelyComment: Accepted at EMNLP201

arXiv.org e-Print Archive

Crossref