638 research outputs found
A prototype machine translation system between Turkmen and Turkish
In this work, we present a prototype system for translation of Turkmen texts into Turkish. Although machine translation (MT) is a very hard task, it is easier to implement a MT system between very close language pairs which have similar syntactic structure and word order. We implement a direct translation system between Turkmen and Turkish which performs a word-to-word transfer. We also use a Turkish Language Model to find the most probable Turkish sentence among all possible candidate translations generated by our system
A MT System from Turkmen to Turkish employing finite state and statistical methods
In this work, we present a MT system from Turkmen to Turkish. Our system exploits the similarity of the languages by using a modified version of direct translation method. However, the complex inflectional and derivational morphology of the Turkic languages necessitate special treatment for word-by-word translation model. We also employ morphology-aware multi-word processing and statistical disambiguation processes in our system. We believe that this approach is valid for most of the Turkic languages and the architecture implemented using FSTs can be easily extended to those languages
Evaluation of Hindi to Punjabi Machine Translation System
Machine Translation in India is relatively young. The earliest efforts date from the late 80s and early 90s. The success of every system is judged from its evaluation experimental results. Number of machine translation systems has been started for development but to the best of author knowledge, no high quality system has been completed which can be used in real applications. Recently, Punjabi University, Patiala, India has developed Punjabi to Hindi Machine translation system with high accuracy of about 92%. Both the systems i.e. system under question and developed system are between same closely related languages. Thus, this paper presents the evaluation results of Hindi to Punjabi machine translation system. It makes sense to use same evaluation criteria as that of Punjabi to Hindi Punjabi Machine Translation System. After evaluation, the accuracy of the system is found to be about 95%
A retrospective view on the promise on machine translation for Bahasa Melayu-English
Research and development activities for machine translation systems from English language to others are more progressive than vice versa. It has been more than 30 years since the machine translation was introduced and yet a Malay language or Bahasa Melayu (BM) to English machine translation engine is not available. Consequently, many translation systems have been developed for the world's top 10 languages in terms of native speakers, but none for BM, although the language is used by more than 200 million speakers around the world. This paper attempts to seek possible reasons as why such situation occurs. A summative overview to show progress, challenges as well as future works on MT is presented. Issues faced by researchers and system developers in modeling and developing a machine translation engine are also discussed. The study of the previous translation systems (from other languages to English) reveals that the accuracy level can be achieved up to 85 %. The figure suggests that the translation system is not reliable if it is to be utilized in a serious translation activity. The most prominent difficulties are the complexity of grammar rules and ambiguity problems of the source language. Thus, we hypothesize that the inclusion of ‘semantic’ property in the translation rules may produce a better quality BM-English MT engine
One-Shot Neural Cross-Lingual Transfer for Paradigm Completion
We present a novel cross-lingual transfer method for paradigm completion, the
task of mapping a lemma to its inflected forms, using a neural encoder-decoder
model, the state of the art for the monolingual task. We use labeled data from
a high-resource language to increase performance on a low-resource language. In
experiments on 21 language pairs from four different language families, we
obtain up to 58% higher accuracy than without transfer and show that even
zero-shot and one-shot learning are possible. We further find that the degree
of language relatedness strongly influences the ability to transfer
morphological knowledge.Comment: Accepted at ACL 201
Cross-lingual Argumentation Mining: Machine Translation (and a bit of Projection) is All You Need!
Argumentation mining (AM) requires the identification of complex discourse
structures and has lately been applied with success monolingually. In this
work, we show that the existing resources are, however, not adequate for
assessing cross-lingual AM, due to their heterogeneity or lack of complexity.
We therefore create suitable parallel corpora by (human and machine)
translating a popular AM dataset consisting of persuasive student essays into
German, French, Spanish, and Chinese. We then compare (i) annotation projection
and (ii) bilingual word embeddings based direct transfer strategies for
cross-lingual AM, finding that the former performs considerably better and
almost eliminates the loss from cross-lingual transfer. Moreover, we find that
annotation projection works equally well when using either costly human or
cheap machine translations. Our code and data are available at
\url{http://github.com/UKPLab/coling2018-xling_argument_mining}.Comment: Accepted at Coling 201
Learning Bilingual Word Representations by Marginalizing Alignments
We present a probabilistic model that simultaneously learns alignments and
distributed representations for bilingual data. By marginalizing over word
alignments the model captures a larger semantic context than prior work relying
on hard alignments. The advantage of this approach is demonstrated in a
cross-lingual classification task, where we outperform the prior published
state of the art.Comment: Proceedings of ACL 2014 (Short Papers
- …