7 research outputs found
Findings of the 2017 Conference on Machine Translation (WMT17)
This paper presents the results of theWMT17 shared tasks, which included three machine translation (MT) tasks(news, biomedical, and multimodal), two evaluation tasks (metrics and run-time estimation of MT quality), an automatic post-editing task, a neural MT training task, and a bandit learning task
Results of the WMT17 metrics shared task
This paper presents the results of the
WMT17 Metrics Shared Task. We asked
participants of this task to score the outputs of the MT systems involved in the
WMT17 news translation task and Neural MT training task. We collected scores
of 14 metrics from 8 research groups. In
addition to that, we computed scores of
7 standard metrics (BLEU, SentBLEU,
NIST, WER, PER, TER and CDER) as
baselines. The collected scores were evaluated in terms of system-level correlation
(how well each metricâs scores correlate
with WMT17 official manual ranking of
systems) and in terms of segment level
correlation (how often a metric agrees with
humans in judging the quality of a particular sentence).
This year, we build upon two types of
manual judgements: direct assessment
(DA) and HUME manual semantic judgements
Findings of the 2017 Conference on Machine Translation
This paper presents the results of the
WMT17 shared tasks, which included
three machine translation (MT) tasks
(news, biomedical, and multimodal), two
evaluation tasks (metrics and run-time estimation
of MT quality), an automatic
post-editing task, a neural MT training
task, and a bandit learning task
Findings of the WMT 2018 shared task on quality estimation
© 2018 The Authors. Published by Association for Computational Linguistics. This is an open access article available under a Creative Commons licence.
The published version can be accessed at the following link on the publisherâs website: http://dx.doi.org/10.18653/v1/W18-6451We report the results of the WMT18 shared task on Quality Estimation, i.e. the task of predicting the quality of the output of machine translation systems at various granularity levels: word, phrase, sentence and document. This year we include four language pairs, three text domains, and translations produced by both statistical and neural machine translation systems. Participating teams from ten institutions submitted a variety of systems to different task variants and language pairs.The data and annotations collected for Tasks 1, 2 and 3 was supported by the EC H2020 QT21 project (grant agreement no. 645452). The shared task organisation was also supported by the QT21 project, national funds through Fundacao para a Ciencia e Tecnologia (FCT), with references UID/CEC/50021/2013 and UID/EEA/50008/2013, and by the European Research Council (ERC StG DeepSPIN 758969). We would also like to thank Julie Beliao and the Unbabel Quality Team for coordinating the annotation of the dataset used in Task 4
Sentence Similarity and Machine Translation
Neural machine translation (NMT) systems encode an input sentence into an intermediate representation and then decode that representation into the output sentence. Translation requires deep understanding of language; as a result, NMT models trained on large amounts of data develop a semantically rich intermediate representation.
We leverage this rich intermediate representation of NMT systemsâin particular, multilingual NMT systems, which learn to map many languages into and out of a joint spaceâfor bitext curation, paraphrasing, and automatic machine translation (MT) evaluation. At a high level, all of these tasks are rooted in similarity: sentence and document alignment requires measuring similarity of sentences and documents, respectively; paraphrasing requires producing output which is similar to an input; and automatic MT evaluation requires measuring the similarity between MT system outputs and corresponding human reference translations.
We use multilingual NMT for similarity in two ways: First, we use a multilingual NMT model with a fixed-size intermediate representation (Artetxe and Schwenk, 2018) to produce multilingual sentence embeddings, which we use in both sentence and document alignment. Second, we train a multilingual NMT model and show that it generalizes to the task of generative paraphrasing (i.e., âtranslatingâ from Russian to Russian), when used in conjunction with a simple generation algorithm to discourage copying from the input to the output. We also use this model for automatic MT evaluation, to force decode and score MT system outputs conditioned on their respective human reference translations. Since we leverage multilingual NMT models, each method works in many languages using a single model.
We show that simple methods, which leverage the intermediate representation of multilingual NMT models trained on large amounts of bitext, outperform prior work in paraphrasing, sentence alignment, document alignment, and automatic MT evaluation. This finding is consistent with recent trends in the natural language processing community, where large language models trained on huge amounts of unlabeled text have achieved state-of-the-art results on tasks such as question answering, named entity recognition, and parsing
Deep learning based semantic textual similarity for applications in translation technology
A thesis submitted in partial fulfilment of the requirements of the University of Wolverhampton for the degree of Doctor of Philosophy.Semantic Textual Similarity (STS) measures the equivalence of meanings
between two textual segments. It is a fundamental task for many natural
language processing applications. In this study, we focus on employing STS in
the context of translation technology. We start by developing models to estimate
STS. We propose a new unsupervised vector aggregation-based STS method
which relies on contextual word embeddings. We also propose a novel Siamese
neural network based on efficient recurrent neural network units. We empirically
evaluate various unsupervised and supervised STS methods, including these
newly proposed methods in three different English STS datasets, two non-
English datasets and a bio-medical STS dataset to list the best supervised and
unsupervised STS methods.
We then embed these STS methods in translation technology applications.
Firstly we experiment with Translation Memory (TM) systems. We propose a
novel TM matching and retrieval method based on STS methods that outperform
current TM systems. We then utilise the developed STS architectures in
translation Quality Estimation (QE). We show that the proposed methods are
simple but outperform complex QE architectures and improve the state-of-theart
results. The implementations of these methods have been released as open
source