5 research outputs found
English-Latvian SMT: the challenge of translating into a free word order language
This paper presents a comparative study of two approaches to
statistical machine translation (SMT) and their application to
a task of English-to-Latvian translation, which is still an open
research line in the field of automatic translation.
We consider a state-of-the-art phrase-based SMT and an
alternative N-gram-based SMT systems. The major differences
between these two approaches lie in the distinct representations
of bilingual units, which are the components of the
bilingual model driving translation process and in the statistical
modeling of the translation context.
Latvian being a rather free word order language implies
additional difficulties to the translation process. We contrast
different reordering models and investigate how well they
deal with the word ordering issue.
Moving beyond automatic scores of translation quality
that are classically presented in MT research papers, we contribute
presenting a manual error analysis of MT systems output
that helps to shed light on advantages and disadvantages
of the SMT systems under consideration and identify the most
prominent source of errors typical for both SMT systems.Postprint (published version
An English-to-Turkish interlingual MT system
This paper describes the integration of a Turkish generation system with the KANT knowledge-based machine translation system to produce a prototype English-Turkish interlingua-based machine translation system. These two independently constructed systems were successfully integrated within a period of two months, through development of a module which maps KANT interlingua expressions to Turkish syntactic structures. The combined system is able to translate completely and correctly 44 of 52 benchmark sentences in the domain of broadcast news captions. This study is the first known application of knowledge-based machine translation from English to Turkish, and our initial results show promise for future development. © Springer-Verlag Berlin Heidelberg 1998
Design and Implementation of a Tactical Generator for Turkish, a Free Constituent Order Language
This thesis describes a tactical generator for Turkish, a free constituent
order language, in which the order of the constituents may change according to
the information structure of the sentences to be generated. In the absence of
any information regarding the information structure of a sentence (i.e., topic,
focus, background, etc.), the constituents of the sentence obey a default
order, but the order is almost freely changeable, depending on the constraints
of the text flow or discourse. We have used a recursively structured finite
state machine for handling the changes in constituent order, implemented as a
right-linear grammar backbone. Our implementation environment is the GenKit
system, developed at Carnegie Mellon University--Center for Machine
Translation. Morphological realization has been implemented using an external
morphological analysis/generation component which performs concrete morpheme
selection and handles morphographemic processes.Comment: M.Sc. Thesis submitted to the Department of Computer Engineering and
Information Science, Bilkent University, Ankara, Turkey. 146 pages (including
title pages). Also available as:
ftp://ftp.cs.bilkent.edu.tr/pub/tech-reports/1996/BU-CEIS-9614.ps.