4 research outputs found

    Syntax-to-morphology alignment and constituent reordering in factored phrase-based statistical machine translation from English to Turkish

    Get PDF
    English is a moderately analytic language in which the meaning is conveyed with function words and the order of constituents. On the other hand, Turkish is an agglutinative language with free constituent order. These differences together with the lack of large scale English-Turkish parallel corpora turn Statistical Machine Translation (SMT) between these languages into a challenging problem. SMT between these two languages, especially from English to Turkish has been worked on for several years. The initial findings [El-Kahlout and Of lazer, 2006] strongly support the idea of representing both Turkish and English at the morpheme-level. Furthermore, several representations and groupings for the morphological structure have been tried on the Turkish side. In contrast to these, this thesis mostly focuses on the experiments on the English side rather than Turkish. In this work we firstly introduce a new way to align the English syntax with the Turkish morphology by associating function words to their related content words. This transformation solely depends on the dependency relations between these words. In addition to this improved alignment, a syntactic reordering is performed to get a more monotonic word alignment. Here, we again use dependencies to identify the sentence constituents and perform reordering between them so that the word order of the source side will be close to the target language. We report our results with BLEU which is a measure that is widely used by the MT community to report research results. With improvements in the alignment and the ordering, we have increased our BLEU score from a baseline score of 17.08 to 23.78, which is an improvement of 6.7 BLEU points, or about 39% relative
    corecore