Experiments on domain adaptation for English-Hindi SMT

Haque, Rejwanul; Naskar, Sudip Kumar; van Genabith, Josef; Way, Andy

research

Experiments on domain adaptation for English-Hindi SMT

Authors: Rejwanul Haque
Sudip Kumar Naskar
Josef van Genabith
Andy Way
Publication date: 1 January 2009
Publisher

Abstract

Statistical Machine Translation (SMT) systems are usually trained on large amounts of bilingual text and monolingual target language text. If a significant amount of out-of-domain data is added to the training data, the quality of translation can drop. On the other hand, training an SMT system on a small amount of training material for given indomain data leads to narrow lexical coverage which again results in a low translation quality. In this paper, (i) we explore domain-adaptation techniques to combine large out-of-domain training data with small-scale in-domain training data for English—Hindi statistical machine translation and (ii) we cluster large out-of-domain training data to extract sentences similar to in-domain sentences and apply adaptation techniques to combine clustered sub-corpora with in-domain training data into a unified framework, achieving a 0.44 absolute corresponding to a 4.03% relative improvement in terms of BLEU over the baseline

Similar works

Full text

Open in the Core reader

Download PDF

Available Versions

Irish Universities

Last time updated on 30/12/2017

Name not available

oai:doras.dcu.ie:15175

Last time updated on 09/02/2018

DCU Online Research Access Service

oai:doras.dcu.ie:15175

Last time updated on 10/07/2013