Combining multi-domain statistical machine translation models using automatic classifiers

Banerjee, Pratyush; Du, Jinhua; Kumar Naskar, Sudip; Li, Baoli; van Genabith, Josef; Way, Andy

research

Combining multi-domain statistical machine translation models using automatic classifiers

Authors: Pratyush Banerjee
Jinhua Du
Sudip Kumar Naskar
Baoli Li
Josef van Genabith
Andy Way
Publication date: 1 January 2010
Publisher: Association for Machine Translation in the Americas

Abstract

This paper presents a set of experiments on Domain Adaptation of Statistical Machine Translation systems. The experiments focus on Chinese-English and two domain-specific corpora. The paper presents a novel approach for combining multiple domain-trained translation models to achieve improved translation quality for both domain-specific as well as combined sets of sentences. We train a statistical classifier to classify sentences according to the appropriate domain and utilize the corresponding domain-specific MT models to translate them. Experimental results show that the method achieves a statistically significant absolute improvement of 1.58 BLEU (2.86% relative improvement) score over a translation model trained on combined data, and considerable improvements over a model using multiple decoding paths of the Moses decoder, for the combined domain test set. Furthermore, even for domain-specific test sets, our approach works almost as well as dedicated domain-specific models and perfect classification

Similar works

Full text

Open in the Core reader

Download PDF

Available Versions

Irish Universities

Last time updated on 30/12/2017

DCU Online Research Access Service

oai:doras.dcu.ie:15804

Last time updated on 10/07/2013

Name not available

oai:doras.dcu.ie:15804

Last time updated on 09/02/2018