Forest-based translation rule extraction

Abstract

Translation rule extraction is a fundamental problem in machine translation, especially for linguistically syntax-based systems that need parse trees from either or both sides of the bi-text. The current dominant practice only uses 1-best trees, which adversely affects the rule set quality due to parsing errors. So we pro-pose a novel approach which extracts rules from a packed forest that compactly encodes exponentially many parses. Experiments show that this method improves translation quality by over 1 BLEU point on a state-of-the-art tree-to-string system, and is 0.5 points better than (and twice as fast as) extracting on 30-best parses. When combined with our previous work on forest-based decoding, it achieves a 2.5 BLEU points improvement over the base-line, and even outperforms the hierarchical system of Hiero by 0.7 points.

Similar works

Full text

thumbnail-image

CiteSeerX

redirect
Last time updated on 28/10/2017

This paper was published in CiteSeerX.

Having an issue?

Is data on this page outdated, violates copyrights or anything else? Report the problem now and we will take corresponding actions after reviewing your request.