94,355 research outputs found

    Recycling Lingware in a Multilingual MT System

    Full text link
    We describe two methods relevant to multi-lingual machine translation systems, which can be used to port linguistic data (grammars, lexicons and transfer rules) between systems used for processing related languages. The methods are fully implemented within the Spoken Language Translator system, and were used to create versions of the system for two new language pairs using only a month of expert effort.Comment: 6 pages, needs aclap.sty. To appear in "From Research to Commercial Applications" workshop at ACL-97, see also http://www.cam.sri.co

    Lost in translation: the problems of using mainstream MT evaluation metrics for sign language translation

    Get PDF
    In this paper we consider the problems of applying corpus-based techniques to minority languages that are neither politically recognised nor have a formally accepted writing system, namely sign languages. We discuss the adoption of an annotated form of sign language data as a suitable corpus for the development of a data-driven machine translation (MT) system, and deal with issues that arise from its use. Useful software tools that facilitate easy annotation of video data are also discussed. Furthermore, we address the problems of using traditional MT evaluation metrics for sign language translation. Based on the candidate translations produced from our example-based machine translation system, we discuss why standard metrics fall short of providing an accurate evaluation and suggest more suitable evaluation methods

    On context span needed for machine translation evaluation

    Get PDF
    Despite increasing efforts to improve evaluation of machine translation (MT) by going beyond the sentence level to the document level, the definition of what exactly constitutes a ``document level'' is still not clear. This work deals with the context span necessary for a more reliable MT evaluation. We report results from a series of surveys involving three domains and 18 target languages designed to identify the necessary context span as well as issues related to it. Our findings indicate that, despite the fact that some issues and spans are strongly dependent on domain and on the target language, a number of common patterns can be observed so that general guidelines for context-aware MT evaluation can be drawn

    A retrospective view on the promise on machine translation for Bahasa Melayu-English

    Get PDF
    Research and development activities for machine translation systems from English language to others are more progressive than vice versa. It has been more than 30 years since the machine translation was introduced and yet a Malay language or Bahasa Melayu (BM) to English machine translation engine is not available. Consequently, many translation systems have been developed for the world's top 10 languages in terms of native speakers, but none for BM, although the language is used by more than 200 million speakers around the world. This paper attempts to seek possible reasons as why such situation occurs. A summative overview to show progress, challenges as well as future works on MT is presented. Issues faced by researchers and system developers in modeling and developing a machine translation engine are also discussed. The study of the previous translation systems (from other languages to English) reveals that the accuracy level can be achieved up to 85 %. The figure suggests that the translation system is not reliable if it is to be utilized in a serious translation activity. The most prominent difficulties are the complexity of grammar rules and ambiguity problems of the source language. Thus, we hypothesize that the inclusion of ‘semantic’ property in the translation rules may produce a better quality BM-English MT engine

    A prototype machine translation system between Turkmen and Turkish

    Get PDF
    In this work, we present a prototype system for translation of Turkmen texts into Turkish. Although machine translation (MT) is a very hard task, it is easier to implement a MT system between very close language pairs which have similar syntactic structure and word order. We implement a direct translation system between Turkmen and Turkish which performs a word-to-word transfer. We also use a Turkish Language Model to find the most probable Turkish sentence among all possible candidate translations generated by our system

    In no uncertain terms : a dataset for monolingual and multilingual automatic term extraction from comparable corpora

    Get PDF
    Automatic term extraction is a productive field of research within natural language processing, but it still faces significant obstacles regarding datasets and evaluation, which require manual term annotation. This is an arduous task, made even more difficult by the lack of a clear distinction between terms and general language, which results in low inter-annotator agreement. There is a large need for well-documented, manually validated datasets, especially in the rising field of multilingual term extraction from comparable corpora, which presents a unique new set of challenges. In this paper, a new approach is presented for both monolingual and multilingual term annotation in comparable corpora. The detailed guidelines with different term labels, the domain- and language-independent methodology and the large volumes annotated in three different languages and four different domains make this a rich resource. The resulting datasets are not just suited for evaluation purposes but can also serve as a general source of information about terms and even as training data for supervised methods. Moreover, the gold standard for multilingual term extraction from comparable corpora contains information about term variants and translation equivalents, which allows an in-depth, nuanced evaluation

    MOTYWACJA I MEDIA ELEKTRONICZNE W NAUCZANIU TŁUMACZEŃ SPECJALISTYCZNYCH

    Get PDF
    Expansion of IT-media in every field of human activity is one of the essential characteristics of modern time. This paper aims at presenting the role of electronic media in teaching translation in the field of law at the University of Osijek, Croatia, and analysing their impact on the motivation of the target group of students in the teaching process. The paper endeavours to provide some insight into the modern teaching practice and to analyse the interconnectedness of the use of electronic media and student motivation rather than to present some empirical research in the field. In the first part of the paper, a theoretical approach to teaching legal translation today is offered. In the main part, teaching legal translation by using modern media is presented on the examples of the Lifelong Learning Programme for Lawyer-Linguists at the Faculty of Law Osijek, and the course on legal translation within the German Language and Literature Studies at the Faculty of Humanities and Social Sciences of Osijek. The usage of electronic media in translation teaching is discussed with reference to the courses Introduction to the Theory of Legal Translation and Online Translation Tools and EU Vocabulary. Specific types of online materials, translation tools and sources are discussed from the point of view of student motivation. New media are also discussed from the perspective of their efficiency at different stages of translation teaching. In the concluding part, application of modern technologies in teaching legal translation is compared with other teaching methods, approaches and techniques. Finally, the author questions using IT as motivation tools in the higher education teaching discourse and argues for application of “moderate approach” in the teaching of legal translation.Ekspansja mediów informatycznych w każdej dziedzinie życia jest jedną z podstawowych cech współczesnego życia. Niniejszy artykuł ma na celu przedstawienie roli mediów elektronicznych w nauczaniu przekładu prawniczego na Uniwersytecie w Osijek w Chorwacji oraz przeanalizowanie ich wpływu na motywację grupy docelowej studentów w procesie nauczania. Autorka stara się przedstawić nowoczesną praktykę dydaktyczną i przeanalizować wzajemne powiązania korzystania z mediów elektronicznych i motywację studentów. W pierwszej części artykułu zaproponowano teoretyczne podejście do nauczania tłumaczenia prawniczego. Na przykładach programu „Lifelong Learning Programme for Lawyer-Linguists” na Wydziale Prawa Osijek oraz kursu tłumaczenia prawniczego w ramach „German Language and Literature Studies” na Wydziale Nauk Humanistycznych i Społecznych w Osijek autorka prezentuje nauczanie tłumaczenia prawniczego przy użyciu nowoczesnych mediów. Wykorzystanie mediów elektronicznych w nauczaniu tłumaczeń omawia się w odniesieniu do kursów „Wprowadzenie do teorii tłumaczenia prawniczego i narzędzi tłumaczenia online oraz słownictwa UE”. Konkretne rodzaje materiałów online, narzędzi tłumaczeniowych i źródeł omawia się z punktu widzenia motywacji studentów. Nowe media są również analizowane pod kątem ich skuteczności na różnych etapach nauczania przekładu. Podsumowując, zastosowanie nowoczesnych technologii w nauczaniu tłumaczenia prawniczego porównuje się z innymi metodami, podejściami i technikami nauczania. Na koniec autorka kwestionuje zasadność wykorzystania narzędzi IT jako motywatorów w dyskursie dydaktycznym szkolnictwa wyższego i opowiada się za zastosowaniem „umiarkowanego podejścia” w nauczaniu tłumaczenia prawniczego

    A MT System from Turkmen to Turkish employing finite state and statistical methods

    Get PDF
    In this work, we present a MT system from Turkmen to Turkish. Our system exploits the similarity of the languages by using a modified version of direct translation method. However, the complex inflectional and derivational morphology of the Turkic languages necessitate special treatment for word-by-word translation model. We also employ morphology-aware multi-word processing and statistical disambiguation processes in our system. We believe that this approach is valid for most of the Turkic languages and the architecture implemented using FSTs can be easily extended to those languages
    corecore