Improving Statistical Machine Translation using Morpho-syntactic Information

Abstract

In the framework of statistical machine translation, correspondences between the words in the source and the target language are learned from bilingual corpora, and often little or no linguistic knowledge is used to structure the underlying models. The work presented in this thesis is motivated by the well-known observation that training data typically does not sufficiently represent the range of phenomena in natural languages. In this thesis, various methods of incorporating morphological and syntactic information into systems for statistical machine translation are proposed and systematically assessed. The overall goal is to improve translation quality and to reduce the amount of parallel text necessary to train the model parameters. The development of the suggested methods is guided by the analysis of important causes of errors

    Similar works