48 research outputs found
Discourse Structure in Machine Translation Evaluation
In this article, we explore the potential of using sentence-level discourse
structure for machine translation evaluation. We first design discourse-aware
similarity measures, which use all-subtree kernels to compare discourse parse
trees in accordance with the Rhetorical Structure Theory (RST). Then, we show
that a simple linear combination with these measures can help improve various
existing machine translation evaluation metrics regarding correlation with
human judgments both at the segment- and at the system-level. This suggests
that discourse information is complementary to the information used by many of
the existing evaluation metrics, and thus it could be taken into account when
developing richer evaluation metrics, such as the WMT-14 winning combined
metric DiscoTKparty. We also provide a detailed analysis of the relevance of
various discourse elements and relations from the RST parse trees for machine
translation evaluation. In particular we show that: (i) all aspects of the RST
tree are relevant, (ii) nuclearity is more useful than relation type, and (iii)
the similarity of the translation RST tree to the reference tree is positively
correlated with translation quality.Comment: machine translation, machine translation evaluation, discourse
analysis. Computational Linguistics, 201
MuLER: Detailed and Scalable Reference-based Evaluation
We propose a novel methodology (namely, MuLER) that transforms any
reference-based evaluation metric for text generation, such as machine
translation (MT) into a fine-grained analysis tool.
Given a system and a metric, MuLER quantifies how much the chosen metric
penalizes specific error types (e.g., errors in translating names of
locations). MuLER thus enables a detailed error analysis which can lead to
targeted improvement efforts for specific phenomena.
We perform experiments in both synthetic and naturalistic settings to support
MuLER's validity and showcase its usability in MT evaluation, and other tasks,
such as summarization. Analyzing all submissions to WMT in 2014-2020, we find
consistent trends. For example, nouns and verbs are among the most frequent POS
tags. However, they are among the hardest to translate. Performance on most POS
tags improves with overall system performance, but a few are not thus
correlated (their identity changes from language to language). Preliminary
experiments with summarization reveal similar trends
Dynamic Context-guided Capsule Network for Multimodal Machine Translation
Multimodal machine translation (MMT), which mainly focuses on enhancing
text-only translation with visual features, has attracted considerable
attention from both computer vision and natural language processing
communities. Most current MMT models resort to attention mechanism, global
context modeling or multimodal joint representation learning to utilize visual
features. However, the attention mechanism lacks sufficient semantic
interactions between modalities while the other two provide fixed visual
context, which is unsuitable for modeling the observed variability when
generating translation. To address the above issues, in this paper, we propose
a novel Dynamic Context-guided Capsule Network (DCCN) for MMT. Specifically, at
each timestep of decoding, we first employ the conventional source-target
attention to produce a timestep-specific source-side context vector. Next, DCCN
takes this vector as input and uses it to guide the iterative extraction of
related visual features via a context-guided dynamic routing mechanism.
Particularly, we represent the input image with global and regional visual
features, we introduce two parallel DCCNs to model multimodal context vectors
with visual features at different granularities. Finally, we obtain two
multimodal context vectors, which are fused and incorporated into the decoder
for the prediction of the target word. Experimental results on the Multi30K
dataset of English-to-German and English-to-French translation demonstrate the
superiority of DCCN. Our code is available on
https://github.com/DeepLearnXMU/MM-DCCN
Phrase-based Machine Translation is State-of-the-Art for Automatic Grammatical Error Correction
In this work, we study parameter tuning towards the M^2 metric, the standard
metric for automatic grammar error correction (GEC) tasks. After implementing
M^2 as a scorer in the Moses tuning framework, we investigate interactions of
dense and sparse features, different optimizers, and tuning strategies for the
CoNLL-2014 shared task. We notice erratic behavior when optimizing sparse
feature weights with M^2 and offer partial solutions. We find that a bare-bones
phrase-based SMT setup with task-specific parameter-tuning outperforms all
previously published results for the CoNLL-2014 test set by a large margin
(46.37% M^2 over previously 41.75%, by an SMT system with neural features)
while being trained on the same, publicly available data. Our newly introduced
dense and sparse features widen that gap, and we improve the state-of-the-art
to 49.49% M^2.Comment: Accepted for publication at EMNLP 201
Deep learning based semantic textual similarity for applications in translation technology
A thesis submitted in partial fulfilment of the requirements of the University of Wolverhampton for the degree of Doctor of Philosophy.Semantic Textual Similarity (STS) measures the equivalence of meanings
between two textual segments. It is a fundamental task for many natural
language processing applications. In this study, we focus on employing STS in
the context of translation technology. We start by developing models to estimate
STS. We propose a new unsupervised vector aggregation-based STS method
which relies on contextual word embeddings. We also propose a novel Siamese
neural network based on efficient recurrent neural network units. We empirically
evaluate various unsupervised and supervised STS methods, including these
newly proposed methods in three different English STS datasets, two non-
English datasets and a bio-medical STS dataset to list the best supervised and
unsupervised STS methods.
We then embed these STS methods in translation technology applications.
Firstly we experiment with Translation Memory (TM) systems. We propose a
novel TM matching and retrieval method based on STS methods that outperform
current TM systems. We then utilise the developed STS architectures in
translation Quality Estimation (QE). We show that the proposed methods are
simple but outperform complex QE architectures and improve the state-of-theart
results. The implementations of these methods have been released as open
source