9,726 research outputs found

    English/Arabic/English Machine Translation: A Historical Perspective

    Get PDF
    This paper examines the history and development of Machine Translation (MT) applications for the Arabic language in the context of the history and machine translation in general. It starts with a discussion of the beginnings of MT in the US and then, depending on the work of MT historians, surveys the decline of the work on MT and drying up of funding; then the revival with globalization, development of information technology and the rising needs for breaking the language barriers in the world; and last on the dramatic developments that came with the advances in computer technology. The paper also examined some of the major approaches for MT within a historical perspective. The case of Arabic is treated along the same lines focusing on the work that was done on Arabic by Western research institutes and Western profit motivated companies. Special attention is given to the work of the one Arab company, Sakr of Al-Alamiyya Group, which was established in 1982 and has seriously since then worked on developing software applications for Arabic under the umbrella of natural language processing for the Arabic language. Major available software applications for Arabic/English Arabic MT as well as MT related software were surveyed within a historical framework.Cet article examine l’histoire et l’évolution des applications de la traduction automatique (TA) en langue arabe, dans le contexte de l’histoire de la TA en gĂ©nĂ©ral. Il commence par dĂ©crire les dĂ©buts de la TA aux États-Unis et son dĂ©clin dĂ» Ă  l’épuisement du financement ; ensuite, son renouveau suscitĂ© par la mondialisation, le dĂ©veloppement des technologies de l’information et les besoins croissants de lever les barriĂšres linguistiques. Finalement, il aborde les progrĂšs vertigineux rĂ©alisĂ©s grĂące Ă  l’informatique. L’article Ă©tudie aussi les principales approches de la TA dans une perspective historique. Le cas de l’arabe est traitĂ© dans cette perspective, compte tenu des travaux effectuĂ©s par les instituts de recherche occidentaux et quelques sociĂ©tĂ©s privĂ©es occidentales. Un accent particulier est mis sur les recherches de la sociĂ©tĂ© arabe Sakr, fondĂ©e dĂšs 1982, qui a mis au point plusieurs logiciels de traitement de langues naturelles pour l’arabe. Ces divers logiciels de TA arabe-anglais-arabe ainsi que des applications associĂ©es sont prĂ©sentĂ©s dans un cadre historique

    A review of EBMT using proportional analogies

    Get PDF
    Some years ago a number of papers reported an experimental implementation of Example Based Machine Translation (EBMT) using Proportional Analogy. This approach, a type of analogical learning, was attractive because of its simplicity; and the papers reported considerable success with the method. This paper reviews what we believe to be the totality of research reported using this method, as an introduction to our own experiments in this framework, reported in a companion paper. We report first some lack of clarity in the previously published work, and then report our findings that the purity of the proportional analogy approach imposes huge run-time complexity for the EBMT task even when heuristics as hinted at in the original literature are applied to reduce the amount of computation

    Hybrid Arabic–French machine translation using syntactic re-ordering and morphological pre-processing

    Get PDF
    This is an accepted manuscript of an article published by Elsevier BV in Computer Speech & Language on 08/11/2014, available online: https://doi.org/10.1016/j.csl.2014.10.007 The accepted version of the publication may differ from the final published version.Arabic is a highly inflected language and a morpho-syntactically complex language with many differences compared to several languages that are heavily studied. It may thus require good pre-processing as it presents significant challenges for Natural Language Processing (NLP), specifically for Machine Translation (MT). This paper aims to examine how Statistical Machine Translation (SMT) can be improved using rule-based pre-processing and language analysis. We describe a hybrid translation approach coupling an Arabic–French statistical machine translation system using the Moses decoder with additional morphological rules that reduce the morphology of the source language (Arabic) to a level that makes it closer to that of the target language (French). Moreover, we introduce additional swapping rules for a structural matching between the source language and the target language. Two structural changes involving the positions of the pronouns and verbs in both the source and target languages have been attempted. The results show an improvement in the quality of translation and a gain in terms of BLEU score after introducing a pre-processing scheme for Arabic and applying these rules based on morphological variations and verb re-ordering (VS into SV constructions) in the source language (Arabic) according to their positions in the target language (French). Furthermore, a learning curve shows the improvement in terms on BLEU score under scarce- and large-resources conditions. The proposed approach is completed without increasing the amount of training data or radically changing the algorithms that can affect the translation or training engines.This paper is based upon work supported by the Natural Sciences and Engineering Research Council of Canada (NSERC) Discovery Grant number 356097-08.Published versio

    TectoMT – a deep-­linguistic core of the combined Chimera MT system

    Get PDF
    Chimera is a machine translation system that combines the TectoMT deep-linguistic core with phrase-based MT system Moses. For English–Czech pair it also uses the Depfix post-correction system. All the components run on Unix/Linux platform and are open source (available from Perl repository CPAN and the LINDAT/CLARIN repository). The main website is https://ufal.mff.cuni.cz/tectomt. The development is currently supported by the QTLeap 7th FP project (http://qtleap.eu)

    Searching to Translate and Translating to Search: When Information Retrieval Meets Machine Translation

    Get PDF
    With the adoption of web services in daily life, people have access to tremendous amounts of information, beyond any human's reading and comprehension capabilities. As a result, search technologies have become a fundamental tool for accessing information. Furthermore, the web contains information in multiple languages, introducing another barrier between people and information. Therefore, search technologies need to handle content written in multiple languages, which requires techniques to account for the linguistic differences. Information Retrieval (IR) is the study of search techniques, in which the task is to find material relevant to a given information need. Cross-Language Information Retrieval (CLIR) is a special case of IR when the search takes place in a multi-lingual collection. Of course, it is not helpful to retrieve content in languages the user cannot understand. Machine Translation (MT) studies the translation of text from one language into another efficiently (within a reasonable amount of time) and effectively (fluent and retaining the original meaning), which helps people understand what is being written, regardless of the source language. Putting these together, we observe that search and translation technologies are part of an important user application, calling for a better integration of search (IR) and translation (MT), since these two technologies need to work together to produce high-quality output. In this dissertation, the main goal is to build better connections between IR and MT, for which we present solutions to two problems: Searching to translate explores approximate search techniques for extracting bilingual data from multilingual Wikipedia collections to train better translation models. Translating to search explores the integration of a modern statistical MT system into the cross-language search processes. In both cases, our best-performing approach yielded improvements over strong baselines for a variety of language pairs. Finally, we propose a general architecture, in which various components of IR and MT systems can be connected together into a feedback loop, with potential improvements to both search and translation tasks. We hope that the ideas presented in this dissertation will spur more interest in the integration of search and translation technologies

    Digital libraries and minority languages

    Get PDF
    Digital libraries have a pivotal role to play in the preservation and maintenance of international cultures in general and minority languages in particular. This paper outlines a software tool for building digital libraries that is well adapted for creating and distributing local information collections in minority languages, and describes some contexts in which it is used. The system can make multilingual documents available in structured collections and allows them to be accessed via multilingual interfaces. It is issued under a free open-source licence, which encourages participatory design of the software, and an end-user interface allows community-based localization of the various language interfaces - of which there are many

    “Leave no one behind”: linguistic and digital barriers to the dissemination and implementation of the United Nation’s Sustainable Development Goals

    Get PDF
    In September 2015 the United Nations (UN) adopted 17 Sustainable Development Goals (SDGs) offering an internationally agreed blueprint for economic, environmental and social development. However those most in need and specifically targeted by the SDGs face significant barriers in accessing information and knowledge about the goals and sustainability in a language or medium that can be understood. Drawing on previous research on the UN’s language policy and practice (McEntee-Atalianis, 2006, 2015, 2016) and analyses of recent UN reports and resolutions on multilingualism, information policy and practice and the SDGs, this paper examines the current status of multilingualism and information transfer within the Organisation. Significant linguistic and digital barriers are identified. It is argued that the UN must plan in more linguistically plural and inclusive ways by developing a tri-sectoral communication network strategy involving civil society, public and private sectors in order to facilitate knowledge transfer and participation, thereby ensuring that ‘no one is left behind’

    Embedding Web-based Statistical Translation Models in Cross-Language Information Retrieval

    Get PDF
    Although more and more language pairs are covered by machine translation services, there are still many pairs that lack translation resources. Cross-language information retrieval (CLIR) is an application which needs translation functionality of a relatively low level of sophistication since current models for information retrieval (IR) are still based on a bag-of-words. The Web provides a vast resource for the automatic construction of parallel corpora which can be used to train statistical translation models automatically. The resulting translation models can be embedded in several ways in a retrieval model. In this paper, we will investigate the problem of automatically mining parallel texts from the Web and different ways of integrating the translation models within the retrieval process. Our experiments on standard test collections for CLIR show that the Web-based translation models can surpass commercial MT systems in CLIR tasks. These results open the perspective of constructing a fully automatic query translation device for CLIR at a very low cost.Comment: 37 page
    • 

    corecore