Initial explorations in English to Turkish statistical machine translation

Durgar El-Kahlout, İlknur; Durgar El-Kahlout, Ilknur; Oflazer, Kemal

research

Initial explorations in English to Turkish statistical machine translation

Authors: İlknur Durgar El-Kahlout
Ilknur Durgar El-Kahlout
Kemal Oflazer
Publication date: 1 January 2006
Publisher
Doi

Abstract

This paper presents some very preliminary results for and problems in developing a statistical machine translation system from English to Turkish. Starting with a baseline word model trained from about 20K aligned sentences, we explore various ways of exploiting morphological structure to improve upon the baseline system. As Turkish is a language with complex agglutinative word structures, we experiment withmorphologically segmented and disambiguated versions of the parallel texts in order to also uncover relations between morphemes and function words in one language with morphemes and functions words in the other, in addition to relations between open class content words. Morphological segmentation on the Turkish side also conflates the statistics from allomorphs so that sparseness can be alleviated to a certain extent. We find that this approach coupled with a simple grouping of most frequent morphemes and function words on both sides improve the BLEU score from the baseline of 0.0752 to 0.0913 with the small training data. We close with a discussion on why one should not expect distortion parameters to model word-local morpheme ordering and that a new approach to handling complex morphotactics is needed

Similar works

Full text

Open in the Core reader

Download PDF

Available Versions

CiteSeerX

oai:CiteSeerX.psu:10.1.1.61.64...

Last time updated on 22/10/2014

Crossref

Last time updated on 01/04/2019

Sabanci University Research Database

oai:research.sabanciuniv.edu:1...

Last time updated on 12/07/2013