217 research outputs found
Results of the WMT14 Metrics Shared Task
This paper presents the results of the
WMT14 Metrics Shared Task. We asked
participants of this task to score the
outputs of the MT systems involved in
WMT14 Shared Translation Task. We col-
lected scores of 23 metrics from 12 re-
search groups. In addition to that we com-
puted scores of 6 standard metrics (BLEU,
NIST, WER, PER, TER and CDER) as
baselines. The collected scores were eval-
uated in terms of system level correlation
(how well each metric’s scores correlate
with WMT14 official manual ranking of
systems) and in terms of segment level
correlation (how often a metric agrees with
humans in comparing two translations of a
particular sentence)
Discourse Structure in Machine Translation Evaluation
In this article, we explore the potential of using sentence-level discourse
structure for machine translation evaluation. We first design discourse-aware
similarity measures, which use all-subtree kernels to compare discourse parse
trees in accordance with the Rhetorical Structure Theory (RST). Then, we show
that a simple linear combination with these measures can help improve various
existing machine translation evaluation metrics regarding correlation with
human judgments both at the segment- and at the system-level. This suggests
that discourse information is complementary to the information used by many of
the existing evaluation metrics, and thus it could be taken into account when
developing richer evaluation metrics, such as the WMT-14 winning combined
metric DiscoTKparty. We also provide a detailed analysis of the relevance of
various discourse elements and relations from the RST parse trees for machine
translation evaluation. In particular we show that: (i) all aspects of the RST
tree are relevant, (ii) nuclearity is more useful than relation type, and (iii)
the similarity of the translation RST tree to the reference tree is positively
correlated with translation quality.Comment: machine translation, machine translation evaluation, discourse
analysis. Computational Linguistics, 201
Exploring Prediction Uncertainty in Machine Translation Quality Estimation
Machine Translation Quality Estimation is a notoriously difficult task, which
lessens its usefulness in real-world translation environments. Such scenarios
can be improved if quality predictions are accompanied by a measure of
uncertainty. However, models in this task are traditionally evaluated only in
terms of point estimate metrics, which do not take prediction uncertainty into
account. We investigate probabilistic methods for Quality Estimation that can
provide well-calibrated uncertainty estimates and evaluate them in terms of
their full posterior predictive distributions. We also show how this posterior
information can be useful in an asymmetric risk scenario, which aims to capture
typical situations in translation workflows.Comment: Proceedings of CoNLL 201
Findings of the 2014 Workshop on Statistical Machine Translation
This paper presents the results of the
WMT14 shared tasks, which included a
standard news translation task, a separate
medical translation task, a task for
run-time estimation of machine translation
quality, and a metrics task. This year, 143
machine translation systems from 23 institutions
were submitted to the ten translation
directions in the standard translation
task. An additional 6 anonymized systems
were included, and were then evaluated
both automatically and manually. The
quality estimation task had four subtasks,
with a total of 10 teams, submitting 57 entries
- …