11,095 research outputs found
Evaluating Discourse Phenomena in Neural Machine Translation
For machine translation to tackle discourse phenomena, models must have
access to extra-sentential linguistic context. There has been recent interest
in modelling context in neural machine translation (NMT), but models have been
principally evaluated with standard automatic metrics, poorly adapted to
evaluating discourse phenomena. In this article, we present hand-crafted,
discourse test sets, designed to test the models' ability to exploit previous
source and target sentences. We investigate the performance of recently
proposed multi-encoder NMT models trained on subtitles for English to French.
We also explore a novel way of exploiting context from the previous sentence.
Despite gains using BLEU, multi-encoder models give limited improvement in the
handling of discourse phenomena: 50% accuracy on our coreference test set and
53.5% for coherence/cohesion (compared to a non-contextual baseline of 50%). A
simple strategy of decoding the concatenation of the previous and current
sentence leads to good performance, and our novel strategy of multi-encoding
and decoding of two sentences leads to the best performance (72.5% for
coreference and 57% for coherence/cohesion), highlighting the importance of
target-side context.Comment: Final version of paper to appear in Proceedings of NAACL 201
On the Evaluation of Semantic Phenomena in Neural Machine Translation Using Natural Language Inference
We propose a process for investigating the extent to which sentence
representations arising from neural machine translation (NMT) systems encode
distinct semantic phenomena. We use these representations as features to train
a natural language inference (NLI) classifier based on datasets recast from
existing semantic annotations. In applying this process to a representative NMT
system, we find its encoder appears most suited to supporting inferences at the
syntax-semantics interface, as compared to anaphora resolution requiring
world-knowledge. We conclude with a discussion on the merits and potential
deficiencies of the existing process, and how it may be improved and extended
as a broader framework for evaluating semantic coverage.Comment: To be presented at NAACL 2018 - 11 page
Discourse Structure in Machine Translation Evaluation
In this article, we explore the potential of using sentence-level discourse
structure for machine translation evaluation. We first design discourse-aware
similarity measures, which use all-subtree kernels to compare discourse parse
trees in accordance with the Rhetorical Structure Theory (RST). Then, we show
that a simple linear combination with these measures can help improve various
existing machine translation evaluation metrics regarding correlation with
human judgments both at the segment- and at the system-level. This suggests
that discourse information is complementary to the information used by many of
the existing evaluation metrics, and thus it could be taken into account when
developing richer evaluation metrics, such as the WMT-14 winning combined
metric DiscoTKparty. We also provide a detailed analysis of the relevance of
various discourse elements and relations from the RST parse trees for machine
translation evaluation. In particular we show that: (i) all aspects of the RST
tree are relevant, (ii) nuclearity is more useful than relation type, and (iii)
the similarity of the translation RST tree to the reference tree is positively
correlated with translation quality.Comment: machine translation, machine translation evaluation, discourse
analysis. Computational Linguistics, 201
Selective Attention for Context-aware Neural Machine Translation
Despite the progress made in sentence-level NMT, current systems still fall
short at achieving fluent, good quality translation for a full document. Recent
works in context-aware NMT consider only a few previous sentences as context
and may not scale to entire documents. To this end, we propose a novel and
scalable top-down approach to hierarchical attention for context-aware NMT
which uses sparse attention to selectively focus on relevant sentences in the
document context and then attends to key words in those sentences. We also
propose single-level attention approaches based on sentence or word-level
information in the context. The document-level context representation, produced
from these attention modules, is integrated into the encoder or decoder of the
Transformer model depending on whether we use monolingual or bilingual context.
Our experiments and evaluation on English-German datasets in different document
MT settings show that our selective attention approach not only significantly
outperforms context-agnostic baselines but also surpasses context-aware
baselines in most cases.Comment: Accepted at NAACL-HLT 201
- …