Natural languages display a great variety of different word orders, and one of the
major challenges facing statistical machine translation is in modelling these differences.
This thesis is motivated by a survey of 110 different language pairs drawn
from the Europarl project, which shows that word order differences account for more
variation in translation performance than any other factor. This wide ranging analysis
provides compelling evidence for the importance of research into reordering.
There has already been a great deal of research into improving the quality of the
word order in machine translation output. However, there has been very little analysis
of how best to evaluate this research. Current machine translation metrics are largely
focused on evaluating the words used in translations, and their ability to measure the
quality of word order has not been demonstrated. In this thesis we introduce novel
metrics for quantitatively evaluating reordering.
Our approach isolates the word order in translations by using word alignments.
We reduce alignment information to permutations and apply standard distance metrics
to compare the word order in the reference to that of the translation. We show
that our metrics correlate more strongly with human judgements of word order quality
than current machine translation metrics. We also show that a combined lexical and
reordering metric, the LRscore, is useful for training translation model parameters.
Humans prefer the output of models trained using the LRscore as the objective function,
over those trained with the de facto standard translation metric, the BLEU score.
The LRscore thus provides researchers with a reliable metric for evaluating the impact
of their research on the quality of word order