Sentence similarity-based source context modelling in PBSMT

Banchs, Rafael E.; Costa-Jussá, Marta; Haque, Rejwanul; Kumar Naskar, Sudip; Way, Andy

research

Sentence similarity-based source context modelling in PBSMT

Authors: Rafael E. Banchs
Marta Costa-Jussá
Rejwanul Haque
Sudip Kumar Naskar
Andy Way
Publication date: 1 December 2010
Publisher: 'Institute of Electrical and Electronics Engineers (IEEE)'
Doi

Abstract

Target phrase selection, a crucial component of the state-of-the-art phrase-based statistical machine translation (PBSMT) model, plays a key role in generating accurate translation hypotheses. Inspired by context-rich word-sense disambiguation techniques, machine translation (MT) researchers have successfully integrated various types of source language context into the PBSMT model to improve target phrase selection. Among the various types of lexical and syntactic features, lexical syntactic descriptions in the form of supertags that preserve long-range word-to-word dependencies in a sentence have proven to be effective. These rich contextual features are able to disambiguate a source phrase, on the basis of the local syntactic behaviour of that phrase. In addition to local contextual information, global contextual information such as the grammatical structure of a sentence, sentence length and n-gram word sequences could provide additional important information to enhance this phrase-sense disambiguation. In this work, we explore various sentence similarity features by measuring similarity between a source sentence to be translated with the source-side of the bilingual training sentences and integrate them directly into the PBSMT model. We performed experiments on an English-to-Chinese translation task by applying sentence-similarity features both individually, and collaboratively with supertag-based features. We evaluate the performance of our approach and report a statistically significant relative improvement of 5.25% BLEU score when adding a sentence-similarity feature together with a supertag-based feature