Search CORE

2 research outputs found

Proceedings of the 17th Annual Conference of the European Association for Machine Translation

Author
Publication venue: Hrvatsko društvo za jezične tehnologije
Publication date: 01/01/2014
Field of study

Proceedings of the 17th Annual Conference of the European Association for Machine Translation (EAMT

Repozitorij Filozofskog fakulteta u Zagrebu' at University of Zagreb

Length-incremental Phrase Training for SMT

Author: Hermann Ney
Joern Wuebker
Publication venue
Publication date: 01/01/2013
Field of study

We present an iterative technique to generate phrase tables for SMT, which is based on force-aligning the training data with a modified translation decoder. Different from previous work, we completely avoid the use of a word alignment or phrase extraction heuristics, moving towards a more principled phrase generation and probability estimation. During training, we allow the decoder to generate new phrases on-the-fly and increment the maximum phrase length in each iteration. Experiments are carried out on the IWSLT 2011 Arabic-English task, where we are able to reach moderate improvements on a state-of-the-art baseline with our training method. The resulting phrase table shows only a small overlap with the heuristically extracted one, which demonstrates the restrictiveness of limiting phrase selection by a word alignment or heuristics. By interpolating the heuristic and the trained phrase table, we can improve over the baseline by 0.5 % BLEU and 0.5 % TER.

CiteSeerX

Publikationsserver der RWTH Aachen University