40 research outputs found
NRC Russian-English Machine Translation System for WMT 2016
We describe the statistical machine translation system developed at the National Research Council of Canada (NRC) for the Russian-English news translation task of the First Conference on Machine Translation (WMT 2016). Our submission is a phrase-based SMT system that tackles the morphological complexity of Russian through comprehensive use of lemmatization. The core of our lemmatization strategy is to use different views of Russian for different SMT components: word alignment and bilingual neural network language models use lemmas, while sparse features and reordering models use fully inflected forms. Some components, such as the phrase table, use both views of the source. Russian words that remain out-ofvocabulary (OOV) after lemmatization are transliterated into English using a statistical model trained on examples mined from the parallel training corpus. The NRC Russian-English MT system achieved the highest uncased BLEU and the lowest TER scores among the eight participants in WMT 2016
Findings of the 2019 Conference on Machine Translation (WMT19)
This paper presents the results of the premier shared task organized alongside the Conference on Machine Translation (WMT) 2019.
Participants were asked to build machine translation systems for any of 18 language pairs, to be evaluated on a test set of news stories. The main metric for this task is human judgment of translation quality. The task was also opened up to additional test suites to probe specific aspects of translation
Results of the WMT16 Metrics Shared Task
This paper presents the results of the
WMT16 Metrics Shared Task. We asked
participants of this task to score the outputs
of the MT systems involved in the
WMT16 Shared Translation Task. We
collected scores of 16 metrics from 9 research
groups. In addition to that, we computed
scores of 9 standard metrics (BLEU,
SentBLEU, NIST, WER, PER, TER and
CDER) as baselines. The collected scores
were evaluated in terms of system-level
correlation (how well each metric’s scores
correlate with WMT16 official manual
ranking of systems) and in terms of segment
level correlation (how often a metric
agrees with humans in comparing two
translations of a particular sentence).
This year there are several additions to
the setup: large number of language pairs
(18 in total), datasets from different domains
(news, IT and medical), and different
kinds of judgments: relative ranking
(RR), direct assessment (DA) and HUME
manual semantic judgments. Finally, generation
of large number of hybrid systems
was trialed for provision of more conclusive
system-level metric rankings
Findings of the 2016 Conference on Machine Translation (WMT16)
This paper presents the results of the
WMT16 shared tasks, which included five
machine translation (MT) tasks (standard
news, IT-domain, biomedical, multimodal,
pronoun), three evaluation tasks (metrics,
tuning, run-time estimation of MT quality),
and an automatic post-editing task
and bilingual document alignment task.
This year, 102 MT systems from 24 institutions
(plus 36 anonymized online systems)
were submitted to the 12 translation
directions in the news translation task. The
IT-domain task received 31 submissions
from 12 institutions in 7 directions and the
Biomedical task received 15 submissions
systems from 5 institutions. Evaluation
was both automatic and manual (relative
ranking and 100-point scale assessments)
Findings of the 2016 Conference on Machine Translation.
This paper presents the results of the
WMT16 shared tasks, which included five
machine translation (MT) tasks (standard
news, IT-domain, biomedical, multimodal,
pronoun), three evaluation tasks (metrics,
tuning, run-time estimation of MT quality),
and an automatic post-editing task
and bilingual document alignment task.
This year, 102 MT systems from 24 institutions
(plus 36 anonymized online systems)
were submitted to the 12 translation
directions in the news translation task. The
IT-domain task received 31 submissions
from 12 institutions in 7 directions and the
Biomedical task received 15 submissions
systems from 5 institutions. Evaluation
was both automatic and manual (relative
ranking and 100-point scale assessments).
The quality estimation task had three subtasks,
with a total of 14 teams, submitting
39 entries. The automatic post-editing task
had a total of 6 teams, submitting 11 entries
Results of the WMT17 metrics shared task
This paper presents the results of the
WMT17 Metrics Shared Task. We asked
participants of this task to score the outputs of the MT systems involved in the
WMT17 news translation task and Neural MT training task. We collected scores
of 14 metrics from 8 research groups. In
addition to that, we computed scores of
7 standard metrics (BLEU, SentBLEU,
NIST, WER, PER, TER and CDER) as
baselines. The collected scores were evaluated in terms of system-level correlation
(how well each metric’s scores correlate
with WMT17 official manual ranking of
systems) and in terms of segment level
correlation (how often a metric agrees with
humans in judging the quality of a particular sentence).
This year, we build upon two types of
manual judgements: direct assessment
(DA) and HUME manual semantic judgements