61,089 research outputs found

    MaTrEx: the DCU machine translation system for IWSLT 2007

    Get PDF
    In this paper, we give a description of the machine translation system developed at DCU that was used for our second participation in the evaluation campaign of the International Workshop on Spoken Language Translation (IWSLT 2007). In this participation, we focus on some new methods to improve system quality. Specifically, we try our word packing technique for different language pairs, we smooth our translation tables with out-of-domain word translations for the Arabic–English and Chinese–English tasks in order to solve the high number of out of vocabulary items, and finally we deploy a translation-based model for case and punctuation restoration

    A retrospective view on the promise on machine translation for Bahasa Melayu-English

    Get PDF
    Research and development activities for machine translation systems from English language to others are more progressive than vice versa. It has been more than 30 years since the machine translation was introduced and yet a Malay language or Bahasa Melayu (BM) to English machine translation engine is not available. Consequently, many translation systems have been developed for the world's top 10 languages in terms of native speakers, but none for BM, although the language is used by more than 200 million speakers around the world. This paper attempts to seek possible reasons as why such situation occurs. A summative overview to show progress, challenges as well as future works on MT is presented. Issues faced by researchers and system developers in modeling and developing a machine translation engine are also discussed. The study of the previous translation systems (from other languages to English) reveals that the accuracy level can be achieved up to 85 %. The figure suggests that the translation system is not reliable if it is to be utilized in a serious translation activity. The most prominent difficulties are the complexity of grammar rules and ambiguity problems of the source language. Thus, we hypothesize that the inclusion of ‘semantic’ property in the translation rules may produce a better quality BM-English MT engine

    Dynamic second language support for Web-based information systems

    Get PDF
    Non-native speakers of English are faced with the 'second language problem' when required to interact with English-based information systems. This paper describes strategies for addressing the problem, which arises through difficulties in comprehension, in the context of Web-based information systems, by means of a dynamic annotation of English language Web pages

    Digital libraries and minority languages

    Get PDF
    Digital libraries have a pivotal role to play in the preservation and maintenance of international cultures in general and minority languages in particular. This paper outlines a software tool for building digital libraries that is well adapted for creating and distributing local information collections in minority languages, and describes some contexts in which it is used. The system can make multilingual documents available in structured collections and allows them to be accessed via multilingual interfaces. It is issued under a free open-source licence, which encourages participatory design of the software, and an end-user interface allows community-based localization of the various language interfaces - of which there are many

    Low-resource machine translation using MATREX: The DCU machine translation system for IWSLT 2009

    Get PDF
    In this paper, we give a description of the Machine Translation (MT) system developed at DCU that was used for our fourth participation in the evaluation campaign of the International Workshop on Spoken Language Translation (IWSLT 2009). Two techniques are deployed in our system in order to improve the translation quality in a low-resource scenario. The first technique is to use multiple segmentations in MT training and to utilise word lattices in decoding stage. The second technique is used to select the optimal training data that can be used to build MT systems. In this year’s participation, we use three different prototype SMT systems, and the output from each system are combined using standard system combination method. Our system is the top system for Chinese–English CHALLENGE task in terms of BLEU score

    Bias and Equivalence in Cross-Cultural Research

    Get PDF
    Bias and equivalence are key concepts in the methodology of cross-cultural studies. Bias is a generic term for any challenge of the comparability of cross-cultural data; bias leads to invalid conclusions. The demonstration of equivalence (lack of bias) is a prerequisite for any cross-cultural comparison. we first describe considerations that are relevant when choosing instruments in a cross-cultural study, notably the question of whether an existing or new instrument is to be preferred.We then describe the definition, manifestation, and sources of three types of bias (construct, method, and item bias), and three levels of equivalence (construct, measurement unit, and full score equivalence). We provide strategies to minimize bias and achieve equivalence that apply either to the design, implementation, or statistical analysis phase of a study. The need to integrate these strategies in cross-cultural studies is emphasized so as to increase the validity of conclusions regarding cross-cultural similarities and differences and rule out alternative explanations of cross-cultural differences

    Exploiting alignment techniques in MATREX: the DCU machine translation system for IWSLT 2008

    Get PDF
    In this paper, we give a description of the machine translation (MT) system developed at DCU that was used for our third participation in the evaluation campaign of the International Workshop on Spoken Language Translation (IWSLT 2008). In this participation, we focus on various techniques for word and phrase alignment to improve system quality. Specifically, we try out our word packing and syntax-enhanced word alignment techniques for the Chinese–English task and for the English–Chinese task for the first time. For all translation tasks except Arabic–English, we exploit linguistically motivated bilingual phrase pairs extracted from parallel treebanks. We smooth our translation tables with out-of-domain word translations for the Arabic–English and Chinese–English tasks in order to solve the problem of the high number of out of vocabulary items. We also carried out experiments combining both in-domain and out-of-domain data to improve system performance and, finally, we deploy a majority voting procedure combining a language model based method and a translation-based method for case and punctuation restoration. We participated in all the translation tasks and translated both the single-best ASR hypotheses and the correct recognition results. The translation results confirm that our new word and phrase alignment techniques are often helpful in improving translation quality, and the data combination method we proposed can significantly improve system performance

    Translation Memory Retrieval Methods

    Get PDF
    Translation Memory (TM) systems are one of the most widely used translation technologies. An important part of TM systems is the matching algorithm that determines what translations get retrieved from the bank of available translations to assist the human translator. Although detailed accounts of the matching algorithms used in commercial systems can't be found in the literature, it is widely believed that edit distance algorithms are used. This paper investigates and evaluates the use of several matching algorithms, including the edit distance algorithm that is believed to be at the heart of most modern commercial TM systems. This paper presents results showing how well various matching algorithms correlate with human judgments of helpfulness (collected via crowdsourcing with Amazon's Mechanical Turk). A new algorithm based on weighted n-gram precision that can be adjusted for translator length preferences consistently returns translations judged to be most helpful by translators for multiple domains and language pairs.Comment: 9 pages, 6 tables, 3 figures; appeared in Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics, April 201
    corecore