research

Disambiguation strategies for data-oriented translation

Abstract

The Data-Oriented Translation (DOT) model { originally proposed in (Poutsma, 1998, 2003) and based on Data-Oriented Parsing (DOP) (e.g. (Bod, Scha, & Sima'an, 2003)) { is best described as a hybrid model of translation as it combines examples, linguistic information and a statistical translation model. Although theoretically interesting, it inherits the computational complexity associated with DOP. In this paper, we focus on one computational challenge for this model: efficiently selecting the `best' translation to output. We present four different disambiguation strategies in terms of how they are implemented in our DOT system, along with experiments which investigate how they compare in terms of accuracy and efficiency

    Similar works