Search CORE

2 research outputs found

MT on and for the Web

Author: Christian Boitet
Getalp
Hervé Blanchon
Mark Seligman
Valérie Bellynck
Publication venue
Publication date: 11/04/2020
Field of study

Abstract A Systran MT server became available on the minitel network in 1984, and on Internet in 1994. Since then we have come to a better understanding of the nature of MT systems by separately analyzing their linguistic, computational, and operational architectures. Also, thanks to the CxAxQ metatheorem, the systems' inherent limits have been clarified, and design choices can now be made in an informed manner according to the translation situations. MT evaluation has also matured: tools based on reference translations are useful for measuring progress; those based on subjective judgments for estimating future usage quality; and task-related objective measures (such as post-editing distances) for measuring operational quality. Moreover, the same technological advances that have led to "Web 2.0" have brought several futuristic predictions to fruition. Free Web MT services have democratized assimilation MT beyond belief. Speech translation research has given rise to usable systems for restricted tasks running on PDAs or on mobile phones connected to servers. New man-machine interface techniques have made interactive disambiguation usable in large-coverage multimodal MT. Increases in computing power have made statistical methods workable, and have led to the possibility of building low-linguisticquality but still useful MT systems by machine learning from aligned bilingual corpora (SMT, EBMT). In parallel, progress has been made in developing interlingua-based MT systems, using hybrid methods. Unfortunately, many misconceptions about MT have spread among the public, and even among MT researchers, because of ignorance of the past and present of MT R&D. A compensating factor is the willingness of end users to freely contribute to building essential parts of the linguistic knowledge needed to construct MT systems, whether corpus-related or lexical. Finally, some developments we anticipated fifteen years ago have not yet materialized, such as online writing tools equipped with interactive disambiguation, and as a corollary the possibility of transforming source documents into self-explaining documents (SEDs) and of producing corresponding SEDs fully automatically in several target languages. These visions should now be realized, thanks to the evolution of Web programming and multilingual NLP techniques, leading towards a true Semantic Web, "Web 3.0", which will support ubilingual (ubiquitous multilingual) computing

CiteSeerX

Localisation interne et en contexte des logiciels commerciaux et libres

Author: Fraisse Amel
Publication venue: HAL CCSD
Publication date: 10/06/2010
Field of study

We propose a novel approach that allows in context localization of most commercial and open source software. Currently, the translation of textual resources of software (technical documents, online help, strings of the user interface, etc.) is entrusted only to professional translators. This makes the localization process long, expensive and sometimes of poor quality because professional translators have no knowledge about the context of use of the software. This current workflow seems impossible to apply for most under-resourced languages for reasons of cost, and quite often scarcity or even lack of professional translators. Our proposal aims at involving end users in the localization process in an efficient and dynamic way: while using an application (in context), users knowing the source language of the software (Often but not always English) could modify strings of the user interface presented by the application in their current context. So, users could translate in context buttons, menus, labels, tabpage, etc. or improve translations proposed by machine translation (MT) or translation memory (TM) systems. To implement this new paradigm, we modify the code as little as possible, very locally and in the same way for all software. Hence our localization method is internal. The implementation of such approach of localization required integration of a translation workflow built with SECTra_w. Thus, we have a new tripartite process of localization which parties are: the user, the software editor and the collaborative SECTra_w Web site. We have experimented our approach on Notepad-plus-plus and on Vuze, two open source applications.Nous proposons une méthode novatrice pour permettre la localisation en contexte de la majorité des logiciels commerciaux et libres, ceux programmés en Java et en C++/C#. Actuellement, la traduction des documents techniques ainsi que celle des éléments d'interface des logiciels commerciaux est confiée uniquement à des professionnels, ce qui allonge le processus de traduction, le rend coûteux, et quelquefois aboutit à une mauvaise qualité car les traducteurs professionnels n'ont pas accès au contexte d'utilisation des éléments textuels. Dès que l'on sort du petit ensemble des quelques langues les mieux dotées, et que lon veut localiser un logiciel pour des " langues peu dotées ", ce processus n'est plus viable pour des raisons de coût et surtout de rareté, de cherté, ou d'absence de traducteurs professionnels. Notre méthode consiste à faire participer de façon efficace et dynamique les bêta-testeurs et les utilisateurs finals au processus de localisation : pendant qu'ils utilisent l'application, les utilisateurs connaissant la langue originale du logiciel (souvent mais pas toujours l'anglais) peuvent intervenir sur les éléments textuels d'interface que l'application leur présente dans leur contexte d'utilisation courant. Ils peuvent ainsi traduire en contexte les boutons, les menus, les étiquettes, les onglets, etc., ou améliorer la traduction proposée par des systèmes de traduction automatique (TA) ou des mémoires de traductions (MT). Afin de mettre en place ce nouveau paradigme, nous avons besoin d'intervenir très localement sur le code source du logiciel : il s'agit donc aussi d'un paradigme de localisation interne. La mise en place d'une telle approche de localisation a nécessité l'intégration d'un gestionnaire de flot de traductions " SECTra_w ". Ainsi, nous avons un nouveau processus de localisation tripartite dont les trois parties sont l'utilisateur, l'éditeur du logiciel et le site collaboratif SECTra_w. Nous avons effectué une expérimentation complète du nouveau processus de localisation sur deux logiciels libres à code source ouvert : Notepad-plus-plus et Vuze

Hal - Université Grenoble Alpes