73,477 research outputs found

    Neural Machine Translation with Deep Attention

    Get PDF
    该论文提出一种深层的注意机制,用于融合深层编码器和深层解码器之间的语义信息,从而进一步增强翻译系统建模源语言和目标语言之间翻译关系的能力。该论文提出的模型可以利用低层注意机制学习到的上下文信息,自动地判定如何从相应的编码层中提取、过滤源端语义信息并融入到相应的解码层之中,从而使高层注意机制拥有更充分的信息来建模深层次的翻译关系,并促使模型的隐层表示更适合目标词汇的预测。在中英、英德和英法三个翻译任务上,新模型取得了近乎最先进的翻译结果。该研究工作由我校软件学院苏劲松老师团队和天津大学熊德意老师团队合作完成。通讯作者为我校软件学院苏劲松副教授,第一作者为我校软件学院硕士生张飚。【Abstract】Deepening neural models has been proven very successful in improving the model's capacity when solving complex learning tasks, such as the machine translation task. Previous efforts on deep neural machine translation mainly focus on the encoder and the decoder, while little on the attention mechanism. However, the attention mechanism is of vital importance to induce the translation correspondence between different languages where shallow neural networks are relatively insufficient, especially when the encoder and decoder are deep. In this paper, we propose a deep attention model (DeepAtt). Based on the low-level attention information, DeepAtt is capable of automatically determining what should be passed or suppressed from the corresponding encoder layer so as to make the distributed representation appropriate for high-level attention and translation. We conduct experiments on NIST Chinese-English, WMT English-German and WMT English-French translation tasks, where, with 5 attention layers, DeepAtt yields very competitive performance against the state-of-the-art results. We empirically find that with an adequate increase of attention layers, DeepAtt tends to produce more accurate attention weights. An in-depth analysis on the translation of important context words further reveals that DeepAtt significantly improves the faithfulness of system translations.The authors were supported by National Natural Science Foundation of China (Nos. 61672440 and 61622209), the Fundamental Research Funds for the Central Universities (Grant No. ZK1024),and Scientific Research Project of National Language Committee of China (Grant No. YB135-49). Biao Zhang greatly acknowledges the support of the Baidu Scholarship. 该项研究得到了国家自然科学基金(Nos. 61672440, 61622209)、中央高校基础科研基金(No. ZK1024)、国家语委科研项目(No. YB13549)、百度奖学金等的资助

    Deep Architectures for Neural Machine Translation

    Get PDF
    It has been shown that increasing model depth improves the quality of neural machine translation. However, different architectural variants to increase model depth have been proposed, and so far, there has been no thorough comparative study. In this work, we describe and evaluate several existing approaches to introduce depth in neural machine translation. Additionally, we explore novel architectural variants, including deep transition RNNs, and we vary how attention is used in the deep decoder. We introduce a novel "BiDeep" RNN architecture that combines deep transition RNNs and stacked RNNs. Our evaluation is carried out on the English to German WMT news translation dataset, using a single-GPU machine for both training and inference. We find that several of our proposed architectures improve upon existing approaches in terms of speed and translation quality. We obtain best improvements with a BiDeep RNN of combined depth 8, obtaining an average improvement of 1.5 BLEU over a strong shallow baseline. We release our code for ease of adoption.Comment: WMT 2017 research trac

    Neural Machine Translation

    Get PDF
    Neural Machine Translation is the primary algorithm used in industry to perform machine translation. This state-of-the-art algorithm is an application of deep learning in which massive datasets of translated sentences are used to train a model capable of translating between any two languages. The architecture behind neural machine translation is composed of two recurrent neural networks used together in tandem to create an Encoder Decoder structure. Attention mechanisms have recently been developed to further increase the accuracy of these models. In this senior thesis, the various parts of Neural Machine Translation are explored towards the eventual creation of a tutorial on the topic. In the first half of this paper, each of the aspects that go into creating a NMT model are explained in depth. With an understanding of the mechanics of NMT, the second portion of this paper briefly outlines enhancements that were made to the PyTorch tutorial on NMT to create an updated and more effective tutorial on the topic
    corecore