6 research outputs found
Explicit length modelling for statistical machine translation
[EN] Explicit length modelling has been previously explored in statistical pattern recognition with successful
results. In this paper, two length models along with two parameter estimation methods and two
alternative parametrisations for statistical machine translation (SMT) are presented. More precisely, we
incorporate explicit bilingual length modelling in a state-of-the-art log-linear SMT system as an
additional feature function in order to prove the contribution of length information. Finally, a
systematic evaluation on reference SMT tasks considering different language pairs proves the benefits
of explicit length modelling.Work supported by the EC (FEDER/FSE) under the transLectures project (FP7-ICT-2011-7-287755) and the Spanish MEC/MICINN under the MIPRCV "Consolider Ingenio 2010" program (CSD2007-00018) and iTrans2 (TIN2009-14511) projects and FPU grant (AP2010-4349). Also supported by the Spanish MITyC under the erudito.com (TSI-020110-2009-439) project and by the Generalitat Valenciana under grants Prometeo/2009/014 and GV/2010/067, and by the UPV under the AdInTAO (20091027) project. The authors wish to thank the anonymous reviewers for their criticisms and suggestions.Silvestre Cerdà, JA.; Andrés Ferrer, J.; Civera Saiz, J. (2012). Explicit length modelling for statistical machine translation. Pattern Recognition. 45(9):3183-3192. https://doi.org/10.1016/j.patcog.2012.01.006S3183319245
Overview of the IWSLT 2017 Evaluation Campaign
The IWSLT 2017 evaluation campaign has organised
three tasks. The Multilingual task, which
is about training machine translation systems
handling many-to-many language directions, including
so-called zero-shot directions. The Dialogue
task, which calls for the integration of
context information in machine translation, in order
to resolve anaphoric references that typically
occur in human-human dialogue turns. And, finally,
the Lecture task, which offers the challenge
of automatically transcribing and translating
real-life university lectures. Following the
tradition of these reports, we will described all
tasks in detail and present the results of all runs
submitted by their participants
A Text Rewriting Decoder with Application to Machine Translation
Ph.DDOCTOR OF PHILOSOPH
Discourse Cohesion in Chinese-English Statistical Machine Translation
In discourse, cohesion is a required component of meaningful and well organised text.
It establishes the relationship between different elements in the text using a number of
devices such as pronouns, determiners, and conjunctions.
In translation a well translated document will display the correct cohesion and use of
cohesive devices that are pertinent to the language. However, not all languages have the
same cohesive devices or use them in the same way. In statistical machine translation
this is a particular barrier to generating smooth translations, especially when sentences in
parallel corpora are being treated in isolation and no extra meaning or cohesive context is
provided beyond the sentential level.
In this thesis, focussing on Chinese 1 and English as the language pair, we examine
discourse cohesion in statistical machine translation looking at ways that systems can leverage discourse cues and signals in order to produce smoother translations. We also provide a statistical model that improves translation output by adding additional tokens within text that can be used to leverage extra information.
A significant part of this research involved visualising many of the results and system outputs, and so an overview of two important pieces of visualisation software that we
developed is also included
Human Feedback in Statistical Machine Translation
The thesis addresses the challenge of improving Statistical Machine Translation (SMT) systems via feedback given by humans on translation quality.
The amount of human feedback available to systems is inherently low due to cost and time limitations. One of our goals is to simulate such information by automatically generating pseudo-human feedback.
This is performed using Quality Estimation (QE) models. QE is a technique for predicting the quality of automatic translations without comparing them to oracle (human) translations, traditionally at the sentence or word levels.
QE models are trained on a small collection of automatic translations manually labelled for quality, and then can predict the quality of any number of unseen translations.
We propose a number of improvements for QE models in order to increase the reliability of pseudo-human feedback.
These include strategies to artificially generate instances for settings where QE training data is scarce.
We also introduce a new level of granularity for QE: the level of phrases. This level aims to improve the quality of QE predictions by better modelling inter-dependencies among errors at word level, and in ways that are tailored to phrase-based SMT, where the basic unit of translation is a phrase. This can thus facilitate work on incorporating human feedback during the translation process.
Finally, we introduce approaches to incorporate pseudo-human feedback in the form of QE predictions in SMT systems. More specifically, we use quality predictions to select the best translation from a number of alternative suggestions produced by SMT systems, and integrate QE predictions into an SMT system decoder in order to guide the translation generation process