Pushdown automata in statistical machine translation

Adrià de Gispert; Aho Alfred V.; Bar-Hillel Y.; Bill Byrne; Blackwood Graeme; Brants Thorsten; Chang Yin-Wen; Chelba Ciprian; Cyril Allauzen; Dyer Chris; Gonzalo Iglesias; Hopkins M.; Huang Liang; Huang Liang; Huang Liang; Koo Terry; Kumar Shankar; Ljolje Andrej; Michael Riley; Mohri Mehryar; Nederhof Mark-Jan; Roark Brian; Roark Brian; Rush Alexander M.; Stolcke Andreas; Stolcke Andreas; Wu Dekai; Zens Richard

Pushdown automata in statistical machine translation

Authors: Adrià de Gispert
Aho Alfred V.
Bar-Hillel Y.
Bill Byrne
Blackwood Graeme
Brants Thorsten
Chang Yin-Wen
Chelba Ciprian
Cyril Allauzen
Dyer Chris
Gonzalo Iglesias
Hopkins M.
Huang Liang
Huang Liang
Huang Liang
Koo Terry
Kumar Shankar
Ljolje Andrej
Michael Riley
Mohri Mehryar
Nederhof Mark-Jan
Roark Brian
Roark Brian
Rush Alexander M.
Stolcke Andreas
Stolcke Andreas
Wu Dekai
Zens Richard
Publication date: 1 January 2013
Publisher: Computational Linguistics
Doi

Abstract

This article describes the use of pushdown automata (PDA) in the context of statistical machine translation and alignment under a synchronous context-free grammar. We use PDAs to compactly represent the space of candidate translations generated by the grammar when applied to an input sentence. General-purpose PDA algorithms for replacement, composition, shortest path, and expansion are presented. We describe HiPDT, a hierarchical phrase-based decoder using the PDA representation and these algorithms. We contrast the complexity of this decoder with a decoder based on a finite state automata representation, showing that PDAs provide a more suitable framework to achieve exact decoding for larger synchronous context-free grammars and smaller language models. We assess this experimentally on a large-scale Chinese-to-English alignment and translation task. In translation, we propose a two-pass decoding strategy involving a weaker language model in the first-pass to address the results of PDA complexity analysis. We study in depth the experimental conditions and tradeoffs in which HiPDT can achieve state-of-the-art performance for large-scale SMT. </jats:p