27 research outputs found
Human Feedback in Statistical Machine Translation
The thesis addresses the challenge of improving Statistical Machine Translation (SMT) systems via feedback given by humans on translation quality.
The amount of human feedback available to systems is inherently low due to cost and time limitations. One of our goals is to simulate such information by automatically generating pseudo-human feedback.
This is performed using Quality Estimation (QE) models. QE is a technique for predicting the quality of automatic translations without comparing them to oracle (human) translations, traditionally at the sentence or word levels.
QE models are trained on a small collection of automatic translations manually labelled for quality, and then can predict the quality of any number of unseen translations.
We propose a number of improvements for QE models in order to increase the reliability of pseudo-human feedback.
These include strategies to artificially generate instances for settings where QE training data is scarce.
We also introduce a new level of granularity for QE: the level of phrases. This level aims to improve the quality of QE predictions by better modelling inter-dependencies among errors at word level, and in ways that are tailored to phrase-based SMT, where the basic unit of translation is a phrase. This can thus facilitate work on incorporating human feedback during the translation process.
Finally, we introduce approaches to incorporate pseudo-human feedback in the form of QE predictions in SMT systems. More specifically, we use quality predictions to select the best translation from a number of alternative suggestions produced by SMT systems, and integrate QE predictions into an SMT system decoder in order to guide the translation generation process
Phrase level segmentation and labelling of machine translation errors
© 2016 The Authors. Published by European Language Resources Association (ELRA). This is an open access article available under a Creative Commons licence.
The published version can be accessed at the following link on the publisherâs website: https://www.aclweb.org/anthology/L16-1356/This paper presents our work towards a novel approach for Quality Estimation (QE) of machine translation based on sequences of adjacent words, the so-called phrases. This new level of QE aims to provide a natural balance between QE at word and sentence-level, which are either too fine grained or too coarse levels for some applications. However, phrase-level QE implies an intrinsic challenge: how to segment a machine translation into sequence of words (contiguous or not) that represent an error. We discuss three possible segmentation strategies to automatically extract erroneous phrases. We evaluate these strategies against annotations at phrase-level produced by humans, using a new dataset collected for this purpose.The authors would like to thanks all the annotators who helped to create the first version of gold-standard annotations at phrase-level. This work was supported by the QT21 (H2020 No. 645452, Lucia Specia, FredÂŽ eric Blain) and EX-PERT (EU FP7 Marie Curie ITN No. 317471, Varvara Logacheva) projects
USFDâs phrase-level quality estimation systems
© 2016 The Authors. Published by Association for Computational Linguistics. This is an open access article available under a Creative Commons licence.
The published version can be accessed at the following link on the publisherâs website: http://dx.doi.org/10.18653/v1/W16-2386Logacheva, V., Blain, F. and Specia, L. (2016) USFDâs phrase-level quality estimation systems. In, Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers, Bojar, O., Buck, C., Chatterjee, R., Federmann, C. et al. (eds.) Stroudsburg, PA: Association for Computational Linguistics, pp. 800-805.This work was supported by the EXPERT (EU FP7 Marie Curie ITN No. 317471, Varvara Logacheva) and the QT21 (H2020 No. 645452, Lucia Specia, FredÂŽ eric Blain) projects
Findings of the WMT 2018 shared task on quality estimation
© 2018 The Authors. Published by Association for Computational Linguistics. This is an open access article available under a Creative Commons licence.
The published version can be accessed at the following link on the publisherâs website: http://dx.doi.org/10.18653/v1/W18-6451We report the results of the WMT18 shared task on Quality Estimation, i.e. the task of predicting the quality of the output of machine translation systems at various granularity levels: word, phrase, sentence and document. This year we include four language pairs, three text domains, and translations produced by both statistical and neural machine translation systems. Participating teams from ten institutions submitted a variety of systems to different task variants and language pairs.The data and annotations collected for Tasks 1, 2 and 3 was supported by the EC H2020 QT21 project (grant agreement no. 645452). The shared task organisation was also supported by the QT21 project, national funds through Fundacao para a Ciencia e Tecnologia (FCT), with references UID/CEC/50021/2013 and UID/EEA/50008/2013, and by the European Research Council (ERC StG DeepSPIN 758969). We would also like to thank Julie Beliao and the Unbabel Quality Team for coordinating the annotation of the dataset used in Task 4
SHEF-NN: translation quality estimation with neural networks
© 2015 The Authors. Published by Association for Computational Linguistics. This is an open access article available under a Creative Commons licence.
The published version can be accessed at the following link on the publisherâs website: https://www.aclweb.org/anthology/W15-3041We describe our systems for Tasks 1 and 2 of the WMT15 Shared Task on Quality Estimation. Our submissions use (i) a continuous space language model to extract additional features for Task 1 (SHEFGP, SHEF-SVM), (ii) a continuous bagof-words model to produce word embeddings as features for Task 2 (SHEF-W2V) and (iii) a combination of features produced by QuEst++ and a feature produced with word embedding models (SHEFQuEst++). Our systems outperform the baseline as well as many other submissions. The results are especially encouraging for Task 2, where our best performing system (SHEF-W2V) only uses features learned in an unsupervised fashion
Findings of the 2017 Conference on Machine Translation
This paper presents the results of the
WMT17 shared tasks, which included
three machine translation (MT) tasks
(news, biomedical, and multimodal), two
evaluation tasks (metrics and run-time estimation
of MT quality), an automatic
post-editing task, a neural MT training
task, and a bandit learning task
Findings of the 2015 Workshop on Statistical Machine Translation
This paper presents the results of the
WMT15 shared tasks, which included a
standard news translation task, a metrics
task, a tuning task, a task for run-time
estimation of machine translation quality,
and an automatic post-editing task. This
year, 68 machine translation systems from
24 institutions were submitted to the ten
translation directions in the standard translation
task. An additional 7 anonymized
systems were included, and were then
evaluated both automatically and manually.
The quality estimation task had three
subtasks, with a total of 10 teams, submitting
34 entries. The pilot automatic postediting
task had a total of 4 teams, submitting
7 entries