24 research outputs found
Context-Aware Neural Machine Translation Learns Anaphora Resolution
Standard machine translation systems process sentences in isolation and hence
ignore extra-sentential information, even though extended context can both
prevent mistakes in ambiguous cases and improve translation coherence. We
introduce a context-aware neural machine translation model designed in such way
that the flow of information from the extended context to the translation model
can be controlled and analyzed. We experiment with an English-Russian subtitles
dataset, and observe that much of what is captured by our model deals with
improving pronoun translation. We measure correspondences between induced
attention distributions and coreference relations and observe that the model
implicitly captures anaphora. It is consistent with gains for sentences where
pronouns need to be gendered in translation. Beside improvements in anaphoric
cases, the model also improves in overall BLEU, both over its context-agnostic
version (+0.7) and over simple concatenation of the context and source
sentences (+0.6).Comment: ACL 201
Improving Context-aware Neural Machine Translation with Target-side Context
In recent years, several studies on neural machine translation (NMT) have
attempted to use document-level context by using a multi-encoder and two
attention mechanisms to read the current and previous sentences to incorporate
the context of the previous sentences. These studies concluded that the
target-side context is less useful than the source-side context. However, we
considered that the reason why the target-side context is less useful lies in
the architecture used to model these contexts.
Therefore, in this study, we investigate how the target-side context can
improve context-aware neural machine translation. We propose a weight sharing
method wherein NMT saves decoder states and calculates an attention vector
using the saved states when translating a current sentence. Our experiments
show that the target-side context is also useful if we plug it into NMT as the
decoder state when translating a previous sentence.Comment: 12 pages; PACLING 201
Contextualized Translation of Automatically Segmented Speech
Direct speech-to-text translation (ST) models are usually trained on corpora
segmented at sentence level, but at inference time they are commonly fed with
audio split by a voice activity detector (VAD). Since VAD segmentation is not
syntax-informed, the resulting segments do not necessarily correspond to
well-formed sentences uttered by the speaker but, most likely, to fragments of
one or more sentences. This segmentation mismatch degrades considerably the
quality of ST models' output. So far, researchers have focused on improving
audio segmentation towards producing sentence-like splits. In this paper,
instead, we address the issue in the model, making it more robust to a
different, potentially sub-optimal segmentation. To this aim, we train our
models on randomly segmented data and compare two approaches: fine-tuning and
adding the previous segment as context. We show that our context-aware solution
is more robust to VAD-segmented input, outperforming a strong base model and
the fine-tuning on different VAD segmentations of an English-German test set by
up to 4.25 BLEU points.Comment: Interspeech 202
Modeling Coherence for Discourse Neural Machine Translation
Discourse coherence plays an important role in the translation of one text.
However, the previous reported models most focus on improving performance over
individual sentence while ignoring cross-sentence links and dependencies, which
affects the coherence of the text. In this paper, we propose to use discourse
context and reward to refine the translation quality from the discourse
perspective. In particular, we generate the translation of individual sentences
at first. Next, we deliberate the preliminary produced translations, and train
the model to learn the policy that produces discourse coherent text by a reward
teacher. Practical results on multiple discourse test datasets indicate that
our model significantly improves the translation quality over the
state-of-the-art baseline system by +1.23 BLEU score. Moreover, our model
generates more discourse coherent text and obtains +2.2 BLEU improvements when
evaluated by discourse metrics.Comment: Accepted by AAAI201