Discourse-aware neural machine translation

Abstract

Machine translation (MT) models usually translate a text by considering isolated sentences based on a strict assumption that the sentences in a text are independent of one another. However, it is a truism that texts have properties of connectedness that go beyond those of their individual sentences. Disregarding dependencies across sentences will harm translation quality especially in terms of coherence, cohesion, and consistency. Previously, some discourse-aware approaches have been investigated for conventional statistical machine translation (SMT). However, this is a serious obstacle for the state-of-the-art neural machine translation (NMT), which recently has surpassed the performance of SMT. In this thesis, we try to incorporate useful discourse information for enhancing NMT models. More specifically, we conduct research on two main parts: 1) exploiting novel document-level NMT architecture; and 2) dealing with a specific discourse phenomenon for translation models. Firstly, we investigate the influence of historical contextual information on the perfor- mance of NMT models. A cross-sentence context-aware NMT model is proposed to consider the influence of previous sentences in the same document. Specifically, this history is summarized using an additional hierarchical encoder. The historical representations are then integrated into the standard NMT model in different strategies. Experimental results on a Chinese–English document-level translation task show that the approach significantly improves upon a strong attention-based NMT system by up to +2.1 BLEU points. In addition, analysis and comparison also give insightful discussions and conclusions for this research direction. Secondly, we explore the impact of discourse phenomena on the performance of MT. In this thesis, we focus on the phenomenon of pronoun-dropping (pro-drop), where, in pro-drop languages, pronouns can be omitted when it is possible to infer the referent from the context. As the data for training a dropped pronoun (DP) generator is scarce, we propose to automatically annotate DPs using alignment information from a large parallel corpus. We then introduce a hybrid approach: building a neural-based DP generator and integrating it into the SMT model. Experimental results on both Chinese–English and Japanese–English translation tasks demonstrate that our approach achieves a significant improvement of up to +1.58 BLEU points with 66% F-score for DP generation accuracy. Motivated by this promising result, we further exploit the DP translation approach for advanced NMT models. A novel reconstruction-based model is proposed to reconstruct the DP-annotated source sentence from the hidden states of either encoder or decoder, or both components. Experimental results on the same translation tasks show that the proposed approach significantly and consistently improves translation performance over a strong NMT baseline, which is trained on DP-annotated parallel data. To avoid the errors propagated from an external DP prediction model, we finally investigate an end-to-end DP translation model. Specifically, we improve the reconstruction-based model from three perspectives. We first employ a shared reconstructor to better exploit encoder and decoder representations. Secondly, we propose to jointly learn to translate and predict DPs. In order to capture discourse information for DP prediction, we finally combine the hierarchical encoder with the DP translation model. Experimental results on the same translation tasks show that our approach significantly improves both translation performance and DP prediction accuracy

    Similar works