9 research outputs found
Rule-Based Machine Translation From Kazakh To Turkish
This paper presents a shallow-transfer machine translation (MT) system for translating from Kazakh to Turkish. Background on the differences between the languages is presented, followed by how the system was designed to handle some of these differences. The system is based on the Apertium free/open-source machine translation platform. The structure of the system and how it works is described, along with an evaluation against two competing systems. Linguistic components were developed, including a Kazakh-Turkish bilingual dictionary, Constraint Grammar disambiguation rules, lexical selection rules, and structural transfer rules. With many known issues yet to be addressed, our RBMT system has reached performance comparable to publicly-available corpus-based MT systems between the languages
Initial explorations in kazakh to english statistical machine translation
The availability of considerable amounts of parallel texts in Kazakh and English has
motivated us to apply statistical machine translation (SMT) paradigm for building a Kazakh-to-English
machine translation system using publicly available data and open-source tools
A free/open-source hybrid morphological disambiguation tool for Kazakh
This paper presents the results of developing a
morphological disambiguation tool for Kazakh. Starting with a
previously developed rule-based approach, we tried to cope with
the complex morphology of Kazakh by breaking up lexical forms
across their derivational boundaries into inflectional groups
and modeling their behavior with statistical methods. A hybrid
rule-based/statistical approach appears to benefit morphological
disambiguation demonstrating a per-token accuracy of 91% in
running text
Rule-based machine translation from Kazakh to Turkish
This paper presents a shallow-transfer machine translation (MT) system for translating from Kazakh to Turkish. Background on the differences between the languages is presented, followed by how the system was designed to handle some of these differences. The system is based on the Apertium free/open-source machine translation platform. The structure of the system and how it works is described, along with an evaluation against two competing systems. Linguistic components were developed, including a Kazakh-Turkish bilingual dictionary, Constraint Grammar disambiguation rules, lexical selection rules, and structural transfer rules. With many known issues yet to be addressed, our RBMT system has reached performance comparable to publicly-available corpus-based MT systems between the languages
Experiments with Russian to Kazakh sentence alignment
Sentence alignment is the final step in building parallel corpora, which arguably has the greatest impact on the quality of a resulting corpus and the accuracy of machine translation systems that use it for training. However, the quality of sentence alignment itself depends on a number of factors. In this paper we investigate the impact of several data processing techniques on the quality of sentence alignment. We develop and use a number of automatic evaluation metrics, and provide empirical evidence that application of all of the considered data processing techniques yields bitexts with the lowest ratio of noise and the highest ratio of parallel sentences
Machine Translation for Crimean Tatar to Turkish
In this paper a machine translation system for Crimean Tatar to Turkish is presented. To our knowledge this is the first Machine Translation system made available for public use for Crimean Tatar, and the first such system released as free and open source software. The system was built using Apertium, a free and open source machine translation system, and is currently unidirectional from Crimean Tatar to Turkish. We describe our translation system, evaluate it on parallel corpora and compare its performance with a Neural Machine Translation system, trained on the limited amount of corpora available
A free/open-source hybrid morphological disambiguation tool for Kazakh
This paper presents the results of developing a
morphological disambiguation tool for Kazakh. Starting with a
previously developed rule-based approach, we tried to cope with
the complex morphology of Kazakh by breaking up lexical forms
across their derivational boundaries into inflectional groups
and modeling their behavior with statistical methods. A hybrid
rule-based/statistical approach appears to benefit morphological
disambiguation demonstrating a per-token accuracy of 91% in
running text