71 research outputs found
Discourse-aware neural machine translation
Machine translation (MT) models usually translate a text by considering isolated sentences
based on a strict assumption that the sentences in a text are independent of one another.
However, it is a truism that texts have properties of connectedness that go beyond those of
their individual sentences. Disregarding dependencies across sentences will harm translation quality especially in terms of coherence, cohesion, and consistency. Previously,
some discourse-aware approaches have been investigated for conventional statistical machine translation (SMT). However, this is a serious obstacle for the state-of-the-art neural
machine translation (NMT), which recently has surpassed the performance of SMT.
In this thesis, we try to incorporate useful discourse information for enhancing NMT
models. More specifically, we conduct research on two main parts: 1) exploiting novel
document-level NMT architecture; and 2) dealing with a specific discourse phenomenon
for translation models.
Firstly, we investigate the influence of historical contextual information on the perfor-
mance of NMT models. A cross-sentence context-aware NMT model is proposed to consider the influence of previous sentences in the same document. Specifically, this history
is summarized using an additional hierarchical encoder. The historical representations are
then integrated into the standard NMT model in different strategies. Experimental results
on a Chinese–English document-level translation task show that the approach significantly
improves upon a strong attention-based NMT system by up to +2.1 BLEU points. In addition, analysis and comparison also give insightful discussions and conclusions for this
research direction.
Secondly, we explore the impact of discourse phenomena on the performance of MT.
In this thesis, we focus on the phenomenon of pronoun-dropping (pro-drop), where, in pro-drop languages, pronouns can be omitted when it is possible to infer the referent from the
context. As the data for training a dropped pronoun (DP) generator is scarce, we propose to
automatically annotate DPs using alignment information from a large parallel corpus. We
then introduce a hybrid approach: building a neural-based DP generator and integrating it
into the SMT model. Experimental results on both Chinese–English and Japanese–English
translation tasks demonstrate that our approach achieves a significant improvement of up to
+1.58 BLEU points with 66% F-score for DP generation accuracy.
Motivated by this promising result, we further exploit the DP translation approach for
advanced NMT models. A novel reconstruction-based model is proposed to reconstruct the
DP-annotated source sentence from the hidden states of either encoder or decoder, or both
components. Experimental results on the same translation tasks show that the proposed approach significantly and consistently improves translation performance over a strong NMT
baseline, which is trained on DP-annotated parallel data.
To avoid the errors propagated from an external DP prediction model, we finally investigate an end-to-end DP translation model. Specifically, we improve the reconstruction-based
model from three perspectives. We first employ a shared reconstructor to better exploit encoder and decoder representations. Secondly, we propose to jointly learn to translate and
predict DPs. In order to capture discourse information for DP prediction, we finally combine the hierarchical encoder with the DP translation model. Experimental results on the
same translation tasks show that our approach significantly improves both translation performance and DP prediction accuracy
New Trends in Machine Translation using Large Language Models: Case Examples with ChatGPT
Machine Translation (MT) has made significant progress in recent years using
deep learning, especially after the emergence of large language models (LLMs)
such as GPT-3 and ChatGPT. This brings new challenges and opportunities for MT
using LLMs. In this paper, we brainstorm some interesting directions for MT
using LLMs, including stylized MT, interactive MT, and Translation Memory-based
MT, as well as a new evaluation paradigm using LLMs. We also discuss the
privacy concerns in MT using LLMs and a basic privacy-preserving method to
mitigate such risks. To illustrate the potential of our proposed directions, we
present several examples for the new directions mentioned above, demonstrating
the feasibility of the proposed directions and highlight the opportunities and
challenges for future research in MT using LLMs
ngram-OAXE: Phrase-Based Order-Agnostic Cross Entropy for Non-Autoregressive Machine Translation
Recently, a new training oaxe loss has proven effective to ameliorate the
effect of multimodality for non-autoregressive translation (NAT), which removes
the penalty of word order errors in the standard cross-entropy loss. Starting
from the intuition that reordering generally occurs between phrases, we extend
oaxe by only allowing reordering between ngram phrases and still requiring a
strict match of word order within the phrases. Extensive experiments on NAT
benchmarks across language pairs and data scales demonstrate the effectiveness
and universality of our approach. %Further analyses show that the proposed
ngram-oaxe alleviates the multimodality problem with a better modeling of
phrase translation. Further analyses show that ngram-oaxe indeed improves the
translation of ngram phrases, and produces more fluent translation with a
better modeling of sentence structure.Comment: COLING 2022 Oral. arXiv admin note: text overlap with
arXiv:2106.0509
Alternate Diverse Teaching for Semi-supervised Medical Image Segmentation
Semi-supervised medical image segmentation studies have shown promise in
training models with limited labeled data. However, current dominant
teacher-student based approaches can suffer from the confirmation bias. To
address this challenge, we propose AD-MT, an alternate diverse teaching
approach in a teacher-student framework. It involves a single student model and
two non-trainable teacher models that are momentum-updated periodically and
randomly in an alternate fashion. To mitigate the confirmation bias from the
diverse supervision, the core of AD-MT lies in two proposed modules: the Random
Periodic Alternate (RPA) Updating Module and the Conflict-Combating Module
(CCM). The RPA schedules the alternating diverse updating process with
complementary data batches, distinct data augmentation, and random switching
periods to encourage diverse reasoning from different teaching perspectives.
The CCM employs an entropy-based ensembling strategy to encourage the model to
learn from both the consistent and conflicting predictions between the
teachers. Experimental results demonstrate the effectiveness and superiority of
our AD-MT on the 2D and 3D medical segmentation benchmarks across various
semi-supervised settings.Comment: code:https://github.com/ZhenZHAO/AD-M
- …