7 research outputs found
Overview of the IWSLT 2012 Evaluation Campaign
open5siWe report on the ninth evaluation campaign organized by
the IWSLT workshop. This year, the evaluation offered multiple
tracks on lecture translation based on the TED corpus, and
one track on dialog translation from Chinese
to English based on the Olympic trilingual corpus.
In particular, the TED tracks included a speech transcription
track in English, a speech translation track from English to French,
and text translation tracks from English to French and from Arabic
to English. In addition to the official tracks, ten unofficial
MT tracks were offered that required translating TED talks into English
from either Chinese, Dutch, German, Polish, Portuguese (Brazilian), Romanian, Russian, Slovak,
Slovene, or Turkish.
16 teams participated in the evaluation and submitted a total of 48 primary runs.
All runs were evaluated with objective metrics, while runs of the official translation
tracks were also ranked by crowd-sourced judges.
In particular, subjective ranking for the TED task was performed on a progress test which permitted
direct comparison of the results from this year against the best results from the 2011 round of the evaluation campaign.Marcello Federico; Mauro Cettolo; Luisa Bentivogli; Michael Paul; Sebastian StükerFederico, Marcello; Cettolo, Mauro; Bentivogli, Luisa; Michael, Paul; Sebastian, Stüke
Human Feedback in Statistical Machine Translation
The thesis addresses the challenge of improving Statistical Machine Translation (SMT) systems via feedback given by humans on translation quality.
The amount of human feedback available to systems is inherently low due to cost and time limitations. One of our goals is to simulate such information by automatically generating pseudo-human feedback.
This is performed using Quality Estimation (QE) models. QE is a technique for predicting the quality of automatic translations without comparing them to oracle (human) translations, traditionally at the sentence or word levels.
QE models are trained on a small collection of automatic translations manually labelled for quality, and then can predict the quality of any number of unseen translations.
We propose a number of improvements for QE models in order to increase the reliability of pseudo-human feedback.
These include strategies to artificially generate instances for settings where QE training data is scarce.
We also introduce a new level of granularity for QE: the level of phrases. This level aims to improve the quality of QE predictions by better modelling inter-dependencies among errors at word level, and in ways that are tailored to phrase-based SMT, where the basic unit of translation is a phrase. This can thus facilitate work on incorporating human feedback during the translation process.
Finally, we introduce approaches to incorporate pseudo-human feedback in the form of QE predictions in SMT systems. More specifically, we use quality predictions to select the best translation from a number of alternative suggestions produced by SMT systems, and integrate QE predictions into an SMT system decoder in order to guide the translation generation process
Findings of the 2015 Workshop on Statistical Machine Translation
This paper presents the results of the
WMT15 shared tasks, which included a
standard news translation task, a metrics
task, a tuning task, a task for run-time
estimation of machine translation quality,
and an automatic post-editing task. This
year, 68 machine translation systems from
24 institutions were submitted to the ten
translation directions in the standard translation
task. An additional 7 anonymized
systems were included, and were then
evaluated both automatically and manually.
The quality estimation task had three
subtasks, with a total of 10 teams, submitting
34 entries. The pilot automatic postediting
task had a total of 4 teams, submitting
7 entries
Findings of the 2014 Workshop on Statistical Machine Translation
This paper presents the results of the
WMT14 shared tasks, which included a
standard news translation task, a separate
medical translation task, a task for
run-time estimation of machine translation
quality, and a metrics task. This year, 143
machine translation systems from 23 institutions
were submitted to the ten translation
directions in the standard translation
task. An additional 6 anonymized systems
were included, and were then evaluated
both automatically and manually. The
quality estimation task had four subtasks,
with a total of 10 teams, submitting 57 entries
POSTECH Machine Translation System for IWSLT 2008 Evaluation Campaign
In this paper, we describe POSTECH system for IWSLT 2008 evaluation campaign. The system is based on phrase based statistical machine translation. We set up a baseline system using well known freely available software. A preprocessing method and a language modeling method have been applied to the baseline system in order to improve machine translation quality. The preprocessing method is to identify and remove useless tokens in source texts. And the language modeling method models phrase level n-gram. We have participated in the BTEC tasks to see the effects of our methods. 1