Search CORE

38 research outputs found

Angļu-latviešu statistiskās mašīntulkošanas sistēmas izveide: metodes, resursi un pirmie rezultāti

Author: Inguna Skadiņa
Lauma Pretkalniņa
Madars Virza
Publication venue: Vilnius University
Publication date: 01/01/2012
Field of study

DEVELOPMENT OF ENGLISH-LATVIAN STATISTICAL MACHINE TRANSLATION SYSTEM: METHODS, RESOURCES AND FIRST RESULTSSummaryThis paper presents research and development of English-Latvian Statistical Machine Translation (SMT) prototypes for legal domain. Several methods have been investigated, i.e., phrase-based models and factored models. Translation quality has been evaluated using automated metrics (BLEU score) and human evaluation. In automatic evaluation the best score (46.44 BLEU points) was assigned to factored model trained on JRC Acquis corpus (version 3.0) which was also evaluated as the best from the human viewpoint. In addition, error analysis of SMT output was performed. This analysis showed that although the output of the best prototype demonstrated a reasonable quality, it had several frequent common errors, i.e., incorrect form, missing words and wrong word order. For the future, work on tree-based SMT and hybrid systems is proposed.</p

Directory of Open Access Journals

Machine Translation Using Automatically Inferred Construction-Based Correspondence and Language Models

Author: Edelman Shimon
Solan Zach
Publication venue: City University of Hong Kong
Publication date: 01/01/2009
Field of study

PACLIC 23 / City University of Hong Kong / 3-5 December 200

Waseda University Repository

A Machine Translation Approach for Chinese Whole-Sentence Pinyin-to-Character Conversion

Author: Lu Bao-Liang
Yang Shaohua
Zhao Hai
Publication venue: Faculty of Computer Science, Universitas Indonesia
Publication date: 01/01/2012
Field of study

Waseda University Repository

A tree-based approach for English-to-Turkish translation

Author: Avar Begüm
Bakay Özge
Yıldız Olcay Taner
Publication venue: 'The Scientific and Technological Research Council of Turkey'
Publication date: 01/01/2019
Field of study

In this paper, we present our English-to-Turkish translation methodology, which adopts a tree-based approach. Our approach relies on tree analysis and the application of structural modification rules to get the target side (Turkish) trees from source side (English) ones. We also use morphological analysis to get candidate root words and apply tree-based rules to obtain the agglutinated target words. Compared to earlier work on English-to-Turkish translation using phrase-based models, we have been able to obtain higher BLEU scores in our current study. Our syntactic subtree permutation strategy, combined with a word replacement algorithm, provides a 67% relative improvement from a baseline 12.8 to 21.4 BLEU, all averaged over 10-fold cross-validation. As future work, improvements in choosing the correct senses and structural rules are needed.This work was supported by TUBITAK project 116E104Publisher's Versio

Isik University Academic Open Access

Stochastic Modelling: From Pattern Classification to Speech Recognition and Language Translation

Author: AJ Robinson
AP Dempster
B Efron
F Jelinek
F Jelinek
H Bourlard
H Ney
H Ney
H Ney
H Ney
H Ney
L Breiman
LE Baum
LR Bahl
PF Brown
RO Duda
S Ortmanns
S Pietra Della
W Wahlster
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/1999
Field of study

This paper gives an overview of the stochastic modelling approach to machine translation. Starting with the Bayes decision rule as in pattern classification and speech recognition, we show how the resulting system architecture can be structured into three parts: the language model probability, the string translation model probability and the search procedure that gener-ates the word sequence in the target language. We discuss the properties of the system components and report results on the translation of spoken dialogues in the VERBMOBIL project. The experience obtained in the VERB-MOBIL project, in particular a large-scale end-to-end evaluation, showed that the stochastic modelling approach resulted in significantly lower error rates than three competing translation approaches: the sentence error rate was 29 % in comparison with 52 % to 62% for the other translation approaches.

CiteSeerX

Crossref

Η Αυτοματοποιημένη και μη-αυτοματοποιημένη αξιολόγηση συστήματος Στατιστικής Μηχανικής Μετάφρασης για το γλωσσικό ζεύγος Ελληνικά - Ιταλικά

Author: Κανελλιάδου Κωνσταντίνος Χατζηθεοδώρου, Πολυξένη
Publication venue: Université Aristote de Thessalonique, Département de Langue et de Littérature Francaises
Publication date: 09/09/2012
Field of study

Machine Translation (MT) evaluation is a hard task considering the difficulties that raise from the translation process itself. In this paper we present the results of the evaluation of a Statistical Machine Translation (SMT) system in which the Moses decoder was trained for the language pair Greek-Italian. The evaluation task was both automatic and non–automatic (human). For the automatic evaluation, the metrics BLEU, NIST were used, while for the human evaluation, the adequacy and the ﬂuency of the translated texts was evaluated. A corpus of 120 individual sentences were evaluated, (e.g. EU texts, scientific technical texts, subtitles, proverbs etc.), by postgraduate students of the direction of Translation, Interpretation and Communication of the Department of Italian Language and Literature of the Aristotle University of Thessaloniki. The first results show that SMT performs well when translating text of this typ

Aristotle University of Thessaloniki: Open Journals / ΑΡΙΣΤΟΤΕΛΕΙΟ ΠΑΝΕΠΙΣΤΗΜΙΟ ΘΕΣΣΑΛΟΝΙΚΗΣ

A corpus for interstellar communication

Author: Atwell ES
Elliott JR
Publication venue: UCREL, Lancaster University
Publication date: 01/01/2001
Field of study

Introduction: SETI, the Search for Extra-Terrestrial Intelligence Many researchers in Astronomy and Astronautics believe the Search for ExtraTerrestrial Intelligence is a serious academic enterprise, worthy of scholarly research and publication (e.g. Burke-Ward 2000, Couper and Henbest 1998, Day 1998, McDonough 1987, Sivier 2000, Norris 1999), and large-scale research sponsorship attracted by the SETI Institute in California. Most of this research community is focussed on techniques for detection of possible incoming signals from extraterrestrial intelligent sources (e.g. Turnbull et al 1999), and algorithms for analysis of these signals to identify intelligent language-like characteristics (e.g. Elliott and Atwell 1999, 2000). However, recently debate has turned to the nature of our response, should a signal arrive and be detected. For example, the 50th International Astronautical Congress devoted a full afternoon session to the question of whether and how we should respon

CiteSeerX

White Rose Research Online

University of St. Andrews - Pure

Genetic-based Decoder for Statistical Machine Translation

Author: Douib Ameur
Langlois David
Smaili Kamel
Publication venue: HAL CCSD
Publication date: 08/12/2016
Field of study

International audienceWe propose a new algorithm for decoding on machine translation process. This approach is based on an evolutionary algorithm. We hope that this new method will constitute an alternative to Moses's decoder which is based on a beam search algorithm while the one we propose is based on the optimisation of a total solution. The results achieved are very encouraging in terms of measures and the proposed translations themselves are well built

INRIA a CCSD electronic archive server

A new model for persian multi-part words edition based on statistical machine translation

Author: A. Arjomandzadeh
M. Zahedi
Publication venue: 'International Digital Organization for Scientific Information (IDOSI)'
Publication date: 01/01/2016
Field of study

Multi-part words in English language are hyphenated and hyphen is used to separate different parts. Persian language consists of multi-part words as well. Based on Persian morphology, half-space character is needed to separate parts of multi-part words where in many cases people incorrectly use space character instead of half-space character. This common incorrectly use of space leads to some serious issues in Persian text processing and text readability. In order to cope with the issues, this work proposes a new model to correct spacing in multi-part words. The proposed method is based on statistical machine translation paradigm. In machine translation paradigm, text in source language is translated into a text in destination language on the basis of statistical models whose parameters are derived from the analysis of bilingual text corpora. The proposed method uses statistical machine translation techniques considering unedited multi-part words as a source language and the space-edited multi-part words as a destination language. The results show that the proposed method can edit and improve spacing correction process of Persian multi-part words with a statistically significant accuracy rate

Directory of Open Access Journals

REVIEW OF THE EVOLUTION OF THE TECHNOLOGY OF AUTOMATIC MACHINE TRANSLATION

Author: Dunđer Ivan
Publication venue
Publication date: 01/01/2021
Field of study

Automatsko strojno prevođenje postalo je nezamjenjiv dio velikog broja organizacija koje posluju u međunarodnom okruženju i koje imaju potrebu generirati velike količine prijevoda za svoju dokumentaciju. Strojno prevođenje danas se smatra jednom od neizostavnih disruptivnih tehnologija koja uvelike doprinose cjelovitoj transformaciji poslovnih procesa u segmentu prevođenja tekstova napisanih na prirodnom jeziku. Ideja iza strojnog prevođenje je omogućiti automatizaciju barem dijela procesa prevođenja, posebno kada je riječ o velikoj količini podataka, ne bi li se ubrzalo cjelokupno poslovanje jedne organizacije i time se ostvarila konkurentska prednost na tržištu koje se brzo mijenja i kojemu se brzo treba prilagoditi. No, razvoj tehnologije automatskog strojnog prevođenja nije tekao tako glatko. Naime, razvoj je popraćen nizom uspona i padova, a upravo je cilj ovog znanstvenog rada dati kritičan i sistematiziran pregled svih ključnih faza razvoja navedene tehnologije, i to u kontekstu svjetskih, ali i domaćih istraživanja u tom području.Automatic machine translation has become a truly irreplaceable part of a large number of organizations that operate in an international environment and in need of generating large amounts of translations for their documentation. Today, machine translation is considered one of the indispensable disruptive technologies that greatly contribute to the complete transformation of business processes in the segment of translating texts written in natural language. The idea behind machine translation is to enable the automation of at least part of the translation process, especially when it comes to a large amount of data, in order to speed up the overall business of an organization and thus gain a competitive advantage in a rapidly changing market, to which one needs to adapt quickly. But the development of automatic machine translation technology did not go so smoothly. Namely, the development is accompanied by a series of ups and downs, and the aim of this very research paper is to give a critical and systematic overview of all key stages of development of this technology, in the context of global and domestic research in this area

HRČAK - Portal of Croatian Scientific and Professional Journals

Hrčak - Portal of scientific journals of Croatia