N-gram-based statistical machine translation versus syntax augmented machine translation: comparison and system combination

Khalilov, Maxim; Rodríguez Fonollosa, José Adrián

unknown

N-gram-based statistical machine translation versus syntax augmented machine translation: comparison and system combination

Authors: Maxim Khalilov
José Adrián Rodríguez Fonollosa
Publication date: 30 March 2009
Publisher

Abstract

In this paper we compare and contrast two approaches to Machine Translation (MT): the CMU-UKA Syntax Augmented Machine Translation system (SAMT) and UPC-TALP N-gram-based Statistical Machine Translation (SMT). SAMT is a hierarchical syntax-driven translation system underlain by a phrase-based model and a target part parse tree. In N-gram-based SMT, the translation process is based on bilingual units related to word-to-word alignment and statistical modeling of the bilingual context following a maximumentropy framework. We provide a stepby- step comparison of the systems and report results in terms of automatic evaluation metrics and required computational resources for a smaller Arabic-to-English translation task (1.5M tokens in the training corpus). Human error analysis clarifies advantages and disadvantages of the systems under consideration. Finally, we combine the output of both systems to yield significant improvements in translation quality.Postprint (published version

Similar works

Full text

Open in the Core reader

Download PDF

Available Versions

UPCommons. Portal del coneixement obert de la UPC

oai:upcommons.upc.edu:2117/751...

Last time updated on 16/06/2016

UPCommons

oai:upcommons.upc.edu:2117/751...

Last time updated on 17/04/2020