7 research outputs found

    Amazigh Representation in the UNL Framework: Resource Implementation

    Get PDF
    AbstractThis paper discusses the first steps undertaken to create necessary linguistic resources to incorporate Amazigh language within the Universal Networking Language (UNL) framework for machine translation purpose. This universal interlanguage allows to any source text to be translated into different other related languages with UNL by converting the meaning of the source text into semantic graph. This encoding is considered as a pivot interlanguage used in translation systems. Thus in this work, we focus on presenting morphological, syntactical and lexical mapping stages needed for building an “Amazigh dictionary” according to the UNL framework and the “UNL-Amazigh Dictionary” that are both taking part in enconversion and deconversion processes

    Multilinguisation d'ontologies dans le cadre de la recherche d'information translingue dans des collections d'images accompagnées de textes spontanés

    Get PDF
    Le Web est une source prolifĂ©rante d'objets multimĂ©dia, dĂ©crits dans diffĂ©rentes langues natu- relles. Afin d'utiliser les techniques du Web sĂ©mantique pour la recherche de tels objets (images, vidĂ©os, etc.), nous proposons une mĂ©thode d'extraction de contenu dans des collections de textes multilingues, paramĂ©trĂ©e par une ou plusieurs ontologies. Le processus d'extraction est utilisĂ© pour indexer les objets multimĂ©dia Ă  partir de leur contenu textuel, ainsi que pour construire des requĂȘtes formelles Ă  partir d'Ă©noncĂ©s spontanĂ©s. Il est basĂ© sur une annotation interlingue des textes, conservant les ambiguĂŻtĂ©s de segmentation et la polysĂ©mie dans des graphes. Cette premiĂšre Ă©tape permet l'utilisation de processus de dĂ©sambiguĂŻsation factorisĂ©s au niveau d'un lexique pivot (de lexĂšmes interlingues). Le passage d'une ontologie en paramĂštre du systĂšme se fait en l'alignant de façon automatique avec le lexique interlingue. Il est ainsi possible d'utiliser des ontologies qui n'ont pas Ă©tĂ© conçues pour une utilisation multilingue, et aussi d'ajouter ou d'Ă©tendre l'ensemble des langues et leurs couvertures lexicales sans modifier les ontologies. Un dĂ©monstrateur pour la recherche multilingue d'images, dĂ©veloppĂ© pour le projet ANR OMNIA, a permis de concrĂ©tiser les approches proposĂ©es. Le passage Ă  l'Ă©chelle et la qualitĂ© des annotations produites ont ainsi pu ĂȘtre Ă©valuĂ©s.The World Wide Web is a proliferating source of multimedia objects described using various natural languages. In order to use semantic Web techniques for retrieval of such objects (images, videos, etc.), we propose a content extraction method in multilingual text collections, using one or several ontologies as parameters. The content extraction process is used on the one hand to index multimedia objects using their textual content, and on the other to build formal requests from spontaneous user requests. The process is based on an interlingual annotation of texts, keeping ambiguities (polysemy and segmentation) in graphs. This first step allows using common desambiguation processes at th elevel of a pivot langage (interlingual lexemes). Passing an ontology as a parameter of the system is done by aligning automatically its elements with the interlingual lexemes of the pivot language. It is thus possible to use ontologies that have not been built for a specific use in a multilingual context, and to extend the set of languages and their lexical coverages without modifying the ontologies. A demonstration software for multilingual image retrieval has been built with the proposed approach in the framework of the OMNIA ANR project, allowing to implement the proposed approaches. It has thus been possible to evaluate the scalability and quality of annotations produiced during the retrieval process.SAVOIE-SCD - Bib.Ă©lectronique (730659901) / SudocGRENOBLE1/INP-Bib.Ă©lectronique (384210012) / SudocGRENOBLE2/3-Bib.Ă©lectronique (384219901) / SudocSudocFranceF

    Outils et environnements pour l'amélioration incrémentale, la post-édition contributive et l'évaluation continue de systÚmes de TA. Application à la TA français-chinois.

    Get PDF
    The thesis, conducted as part of a CIFRE grant, and extending one of the aspects of the ANR project Traouiero, first addresses the production, extension and improvement of multilingual corpora by machine translation (MT) and contributory post-editing (PE). Functional and technical improvements have been made to the SECTra and iMAG software produced in previous PhD theses (P.C. Huynh, H.T. Nguyen), and progress has ben made toward a generic definition of the structure of a multilingual, annotated and multi-media corpus that may contain usual documents as well as pseudo-documents (such as Web pages) and meta-segments. This part has been validated by the creation of good French-Chinese bilingual corpora, one of them resulting from the first application to literary translation (a Jules Verne novel).A second part, initially motivated by an industrial need, has consisted in building MT systems of Moses type, specialized to sub-languages, for french↔chinese, and to study how to improve them in the context of a continuous use with the possibility of PE. As part of an internal project on the LIG website and of a project (TABE-FC) in cooperation with Xiamen University, it has been possible to demonstrate the value of incremental learning in statistical MT, under certain conditions, through an experiment that spread over the whole thesis.The third part of the thesis is devoted to contributing and making available computer tools and resources. The main ones are related to the COST project MUMIA of the EU and result from the exploitation of the CLEF-2011 collection of 1.5 million partially multilingual patents. Large translation memories have been extracted from it (17.5 million segments), 3 MT systems have been produced (de-fr, en-fr, fr-de), and a website of support for multilingual IR on patents has been constructed. One also describes the on-going implementation of JianDan-eval, a platform for building, deploying and evaluating MT systems.La thĂšse, effectuĂ©e dans le cadre d'une bourse CIFRE, et prolongeant un des aspects du projet ANR Traouiero, aborde d'abord la production, l'extension et l'amĂ©lioration de corpus multilingues par traduction automatique (TA) et post-Ă©dition contributive (PE). Des amĂ©liorations fonctionnelles et techniques ont Ă©tĂ© apportĂ©es aux logiciels SECTra et iMAG, et on a progressĂ© vers une dĂ©finition gĂ©nĂ©rique de la structure d'un corpus multilingue, multi-annotĂ© et multimĂ©dia, pouvant contenir des documents classiques aussi bien que des pseudo-documents et des mĂ©ta-segments. Cette partie a Ă©tĂ© validĂ©e par la crĂ©ation de bons corpus bilingues français-chinois, l'un d'eux rĂ©sultant de la toute premiĂšre application Ă  la traduction littĂ©raire.Une seconde partie, initialement motivĂ©e par un besoin industriel, a consistĂ© Ă  construire des systĂšmes de TA de type Moses, spĂ©cialisĂ©s Ă  des sous-langages, en français↔chinois, et Ă  Ă©tudier la façon de les amĂ©liorer dans le cadre d'un usage en continu avec possibilitĂ© de PE. Dans le cadre d'un projet interne sur le site du LIG et d'un projet (TABE-FC) en coopĂ©ration avec l'universitĂ© de Xiamen, on a pu dĂ©montrer l'intĂ©rĂȘt de l'apprentissage incrĂ©mental en TA statistique, sous certaines conditions, grĂące Ă  une expĂ©rience qui s'est Ă©talĂ©e sur toute la thĂšse.La troisiĂšme partie est consacrĂ©e Ă  des contributions et mises Ă  disposition de supports informatiques et de ressources. Les principales se placent dans le cadre du projet COST MUMIA de l'EU et rĂ©sultent de l'exploitation de la collection CLEF-2011 de 1,5 M de brevets partiellement multilingues. De grosses mĂ©moires de traductions en ont Ă©tĂ© extraites (17,5 M segments), 3 systĂšmes de TA en ont Ă©tĂ© tirĂ©s, et un site Web de support Ă  la RI multilingue sur les brevets a Ă©tĂ© construit. On dĂ©crit aussi la rĂ©alisation en cours de JianDan-eval, une plate-forme de construction, dĂ©ploiement et Ă©valuation de systĂšmes de TA

    Finnish Ice Hockey Organisations as Multilingual Work Environments

    Get PDF
    This thesis studies the phenomenon of workplace multilingualism in the Finnish professional ice hockey community. The primary research material of the thesis consists of four interviews with members of the professional community: two staff members of team organisations, an international player, and a referee. The interviews focussed on gathering the respondents’ subjective experiences of multilingualism in their everyday work environment. They were asked to identify which languages were used in their work environment and what strategies and policies were in place to manage multilingualism. The study also explored how these individuals viewed possible difficulties and advantages which may arise from the multilingual nature of their work community. The interviews were analysed using a qualitative content analysis approach. The analysis revealed that the use of English as a lingua franca in parallel with Finnish was common in the organisations the interviewees represented. Self-translation and non-professional translations by members of the community were used to bridge gaps in participants’ language skills. The use of professional translators was not considered cost-effective or practical in the everyday ice hockey environment. While concrete multilingualism policies were not implemented, and management of multilingualism seemed to rely on implicit assumptions rather than explicit coordination, the interviewees were in general satisfied with the current state of language management in their community. Despite this, more efficient management of language issues in the future would be beneficial for ice hockey organisations not only from a practical viewpoint but possibly also in terms of facilitating better athletic achievements
    corecore