Traducción automática estadística: modelos de traducción basados en máxima entropía y algoritmos de búsqueda

Abstract

Tesis en españolThis thesis is completely devoted to machine translation, a discipline which belongs to the fields of artificial intelligence and computational linguistics, and it is more specifically devoted to the statistical machine translation approach. The work presented here is based solely on the translation approach to Bayes' decision rule. Thus, this work can be classified as empiricist in contrast to a more rationalistic approach based on linguistic techniques. We can distinguish three main different ingredients within this approach to machine translation: a language model, a translation model, and a search decoding problem. The latter can be considered the sheer problem of machine translation. The first ingredient, the language model, is not treated here in full, mainly because it has been widely studied and used in the field of automatic speech recognition. Due to this, we only make use of the techniques already developed in language modelling and, more specifically, we use the well-known n-gram models. Nonetheless, because their relevance to the problem addressed in this thesis, we do introduce the basic and necessary concepts of language models. With respect to translation models, it must be said first that this work is based on single-word statistical translation models. We have introduced context-dependent lexicon models based on maximum entropy techniques. In particular, we have seen how to develop maximum entropy translation models, how to integrate them into the training algorithms of conventional translation models, and how to use them in order to improve the performance of statistical machine translation systems. Regarding the search decoding problem, we have proposed, designed, and studied several search algorithms following three classical problem solving paradigms. Those paradigms are: dynamic programming, the branch and bound approach, and the greedy algorithms approach. For all of them we have performed a detailed study concerning efficiency and translation quality. Additionally, a study of computational and empirical complexity has been done. This work has been supported by a large quantity of experiments, which were carried out on three well-known tasks in the field of machine translation: the tourist Spanish-English task (better known as EuTrans-I), the French-English Hansards task, and the German-English Verbmobil task

    Similar works