Search CORE

2 research outputs found

Algebraic decoder specification: coupling formal-language theory and statistical machine translation: Algebraic decoder specification: coupling formal-language theory and statistical machine translation

Author: Büchse Matthias
Publication venue
Publication date: 18/12/2014
Field of study

The specification of a decoder, i.e., a program that translates sentences from one natural language into another, is an intricate process, driven by the application and lacking a canonical methodology. The practical nature of decoder development inhibits the transfer of knowledge between theory and application, which is unfortunate because many contemporary decoders are in fact related to formal-language theory. This thesis proposes an algebraic framework where a decoder is specified by an expression built from a fixed set of operations. As yet, this framework accommodates contemporary syntax-based decoders, it spans two levels of abstraction, and, primarily, it encourages mutual stimulation between the theory of weighted tree automata and the application

Technische Universität Dresden: Qucosa

Asynchronous Binarization for Synchronous Grammars

Author: Adam Pauls
Dan Klein
John Denero
Publication venue
Publication date: 01/01/2009
Field of study

Binarization of n-ary rules is critical for the efficiency of syntactic machine translation decoding. Because the target side of a rule will generally reorder the source side, it is complex (and sometimes impossible) to find synchronous rule binarizations. However, we show that synchronous binarizations are not necessary in a two-stage decoder. Instead, the grammar can be binarized one way for the parsing stage, then rebinarized in a different way for the reranking stage. Each individual binarization considers only one monolingual projection of the grammar, entirely avoiding the constraints of synchronous binarization and allowing binarizations that are separately optimized for each stage. Compared to n-ary forest reranking, even simple target-side binarization schemes improve overall decoding accuracy.

CiteSeerX

Crossref