38 research outputs found

    Angļu-latviešu statistiskās mašīntulkošanas sistēmas izveide: metodes, resursi un pirmie rezultāti

    Get PDF
    <p class="Pa4"><strong>DEVELOPMENT OF ENGLISH-LATVIAN STATISTICAL MACHINE TRANSLATION SYSTEM: METHODS, RESOURCES AND FIRST RESULTS</strong></p><p class="Pa5"><em>Summary</em></p><p>This paper presents research and development of English-Latvian Statistical Machine Translation (SMT) prototypes for legal domain. Several methods have been investigated, i.e., phrase-based models and factored models. Translation quality has been evaluated using automated metrics (BLEU score) and human evaluation. In automatic evaluation the best score (46.44 BLEU points) was assigned to factored model trained on JRC Ac­quis corpus (version 3.0) which was also evaluated as the best from the human viewpoint. In addition, error analysis of SMT output was performed. This analysis showed that al­though the output of the best prototype demonstrated a reasonable quality, it had several frequent common errors, i.e., incorrect form, missing words and wrong word order. For the future, work on tree-based SMT and hybrid systems is proposed.</p

    Machine Translation Using Automatically Inferred Construction-Based Correspondence and Language Models

    Get PDF
    PACLIC 23 / City University of Hong Kong / 3-5 December 200

    A Machine Translation Approach for Chinese Whole-Sentence Pinyin-to-Character Conversion

    Get PDF

    A tree-based approach for English-to-Turkish translation

    Get PDF
    In this paper, we present our English-to-Turkish translation methodology, which adopts a tree-based approach. Our approach relies on tree analysis and the application of structural modification rules to get the target side (Turkish) trees from source side (English) ones. We also use morphological analysis to get candidate root words and apply tree-based rules to obtain the agglutinated target words. Compared to earlier work on English-to-Turkish translation using phrase-based models, we have been able to obtain higher BLEU scores in our current study. Our syntactic subtree permutation strategy, combined with a word replacement algorithm, provides a 67% relative improvement from a baseline 12.8 to 21.4 BLEU, all averaged over 10-fold cross-validation. As future work, improvements in choosing the correct senses and structural rules are needed.This work was supported by TUBITAK project 116E104Publisher's Versio

    Stochastic Modelling: From Pattern Classification to Speech Recognition and Language Translation

    Full text link
    This paper gives an overview of the stochastic modelling approach to machine translation. Starting with the Bayes decision rule as in pattern classification and speech recognition, we show how the resulting system architecture can be structured into three parts: the language model probability, the string translation model probability and the search procedure that gener-ates the word sequence in the target language. We discuss the properties of the system components and report results on the translation of spoken dialogues in the VERBMOBIL project. The experience obtained in the VERB-MOBIL project, in particular a large-scale end-to-end evaluation, showed that the stochastic modelling approach resulted in significantly lower error rates than three competing translation approaches: the sentence error rate was 29 % in comparison with 52 % to 62% for the other translation approaches.

    Η Αυτοματοποιημένη και μη-αυτοματοποιημένη αξιολόγηση συστήματος Στατιστικής Μηχανικής Μετάφρασης για το γλωσσικό ζεύγος Ελληνικά - Ιταλικά

    Get PDF
    Machine Translation (MT) evaluation is a hard task considering the difficulties that raise from the translation process itself. In this paper we present the results of the evaluation of a Statistical Machine Translation (SMT) system in which the Moses decoder was trained for the language pair Greek-Italian. The evaluation task was both automatic and non–automatic (human). For the automatic evaluation, the metrics BLEU, NIST were used, while for the human evaluation, the adequacy and the fluency of the translated texts was evaluated. A corpus of 120 individual sentences were evaluated, (e.g. EU texts, scientific technical texts, subtitles, proverbs etc.), by postgraduate students of the direction of Translation, Interpretation and Communication of the Department of Italian Language and Literature of the Aristotle University of Thessaloniki. The first results show that SMT performs well when translating text of this typ

    A corpus for interstellar communication

    Get PDF
    Introduction: SETI, the Search for Extra-Terrestrial Intelligence Many researchers in Astronomy and Astronautics believe the Search for ExtraTerrestrial Intelligence is a serious academic enterprise, worthy of scholarly research and publication (e.g. Burke-Ward 2000, Couper and Henbest 1998, Day 1998, McDonough 1987, Sivier 2000, Norris 1999), and large-scale research sponsorship attracted by the SETI Institute in California. Most of this research community is focussed on techniques for detection of possible incoming signals from extraterrestrial intelligent sources (e.g. Turnbull et al 1999), and algorithms for analysis of these signals to identify intelligent language-like characteristics (e.g. Elliott and Atwell 1999, 2000). However, recently debate has turned to the nature of our response, should a signal arrive and be detected. For example, the 50th International Astronautical Congress devoted a full afternoon session to the question of whether and how we should respon

    Genetic-based Decoder for Statistical Machine Translation

    Get PDF
    International audienceWe propose a new algorithm for decoding on machine translation process. This approach is based on an evolutionary algorithm. We hope that this new method will constitute an alternative to Moses's decoder which is based on a beam search algorithm while the one we propose is based on the optimisation of a total solution. The results achieved are very encouraging in terms of measures and the proposed translations themselves are well built

    A new model for persian multi-part words edition based on statistical machine translation

    Get PDF
    Multi-part words in English language are hyphenated and hyphen is used to separate different parts. Persian language consists of multi-part words as well. Based on Persian morphology, half-space character is needed to separate parts of multi-part words where in many cases people incorrectly use space character instead of half-space character. This common incorrectly use of space leads to some serious issues in Persian text processing and text readability. In order to cope with the issues, this work proposes a new model to correct spacing in multi-part words. The proposed method is based on statistical machine translation paradigm. In machine translation paradigm, text in source language is translated into a text in destination language on the basis of statistical models whose parameters are derived from the analysis of bilingual text corpora. The proposed method uses statistical machine translation techniques considering unedited multi-part words as a source language and the space-edited multi-part words as a destination language. The results show that the proposed method can edit and improve spacing correction process of Persian multi-part words with a statistically significant accuracy rate


    Get PDF
    Automatsko strojno prevođenje postalo je nezamjenjiv dio velikog broja organizacija koje posluju u međunarodnom okruženju i koje imaju potrebu generirati velike količine prijevoda za svoju dokumentaciju. Strojno prevođenje danas se smatra jednom od neizostavnih disruptivnih tehnologija koja uvelike doprinose cjelovitoj transformaciji poslovnih procesa u segmentu prevođenja tekstova napisanih na prirodnom jeziku. Ideja iza strojnog prevođenje je omogućiti automatizaciju barem dijela procesa prevođenja, posebno kada je riječ o velikoj količini podataka, ne bi li se ubrzalo cjelokupno poslovanje jedne organizacije i time se ostvarila konkurentska prednost na tržištu koje se brzo mijenja i kojemu se brzo treba prilagoditi. No, razvoj tehnologije automatskog strojnog prevođenja nije tekao tako glatko. Naime, razvoj je popraćen nizom uspona i padova, a upravo je cilj ovog znanstvenog rada dati kritičan i sistematiziran pregled svih ključnih faza razvoja navedene tehnologije, i to u kontekstu svjetskih, ali i domaćih istraživanja u tom području.Automatic machine translation has become a truly irreplaceable part of a large number of organizations that operate in an international environment and in need of generating large amounts of translations for their documentation. Today, machine translation is considered one of the indispensable disruptive technologies that greatly contribute to the complete transformation of business processes in the segment of translating texts written in natural language. The idea behind machine translation is to enable the automation of at least part of the translation process, especially when it comes to a large amount of data, in order to speed up the overall business of an organization and thus gain a competitive advantage in a rapidly changing market, to which one needs to adapt quickly. But the development of automatic machine translation technology did not go so smoothly. Namely, the development is accompanied by a series of ups and downs, and the aim of this very research paper is to give a critical and systematic overview of all key stages of development of this technology, in the context of global and domestic research in this area