22,235 research outputs found

    TMX markup: a challenge when adapting SMT to the localisation environment

    Get PDF
    Translation memory (TM) plays an important role in localisation workflows and is used as an efficient and fundamental tool to carry out translation. In recent years, statistical machine translation (SMT) techniques have been rapidly developed, and the translation quality and speed have been significantly improved as well. However,when applying SMT technique to facilitate post-editing in the localisation industry, we need to adapt SMT to the TM data which is formatted with special mark-up. In this paper, we explore some issues when adapting SMT to Symantec formatted TM data. Three different methods are proposed to handle the Translation Memory eXchange (TMX) markup and a comparative study is carried out between them. Furthermore, we also compare the TMX-based SMT systems with a customised SYSTRAN system through human evaluation and automatic evaluation metrics. The experimental results conducted on the French and English language pair show that the SMT can perform well using TMX as input format either during training or at runtime

    Hybrid rule-based - example-based MT: feeding apertium with sub-sentential translation units

    Get PDF
    This paper describes a hybrid machine translation (MT) approach that consists of integrating bilingual chunks (sub-sentential translation units) obtained from parallel corpora into an MT system built using the Apertium free/open-source rule-based machine translation platform, which uses a shallow-transfer translation approach. In the integration of bilingual chunks, special care has been taken so as not to break the application of the existing Apertium structural transfer rules, since this would increase the number of ungrammatical translations. The method consists of (i) the application of a dynamic-programming algorithm to compute the best translation coverage of the input sentence given the collection of bilingual chunks available; (ii) the translation of the input sentence as usual by Apertium; and (iii) the application of a language model to choose one of the possible translations for each of the bilingual chunks detected. Results are reported for the translation from English-to-Spanish, and vice versa, when marker-based bilingual chunks automatically obtained from parallel corpora are used

    Quantifying the effect of machine translation in a high-quality human translation production process

    Get PDF
    This paper studies the impact of machine translation (MT) on the translation workflow at the Directorate-General for Translation (DGT), focusing on two language pairs and two MT paradigms: English-into-French with statistical MT and English-into-Finnish with neural MT. We collected data from 20 professional translators at DGT while they carried out real translation tasks in normal working conditions. The participants enabled/disabled MT for half of the segments in each document. They filled in a survey at the end of the logging period. We measured the productivity gains (or losses) resulting from the use of MT and examined the relationship between technical effort and temporal effort. The results show that while the usage of MT leads to productivity gains on average, this is not the case for all translators. Moreover, the two technical effort indicators used in this study show weak correlations with post-editing time. The translators' perception of their speed gains was more or less in line with the actual results. Reduction of typing effort is the most frequently mentioned reason why participants preferred working with MT, but also the psychological benefits of not having to start from scratch were often mentioned

    Skills and Profile of the New Role of the Translator as MT Post-editor

    Get PDF
    This paper explores the skills and profile of the new role of the translator as MT post-editor in view of the rising interest and use of MT in the translation industry. After a brief review of the relevant literature declaring post-editing (PE) as a profession on its own, the paper goes on to identify the different tasks involved in PE processes, following the work of Krings (Krings, 2001). Then, a series of competences are defined and grouped into three main categories: core competences, linguistic skills and instrumental competences. Finally, a description of the controlled translation scenario of MT PE is advanced taking into account the overall scenario of any translation project, including client description, text domain, text description, use of glossaries, MT engine, MT output quality and purpose of the translated text.Aquest article aborda les habilitats i les característiques del perfil del nou rol del traductor com a posteditor de traducció automàtica, tot i tenint en compte l'augment de l'interès en i l'ús de la traducció automàtica per part de la industria de la traducció. Després d'una breu revisió de la literatura més rellevant sobre postedició (PE) en tant que professió per ella mateixa, l'article identifica les diferents tasques implicades en els processos de PE, segons la proposta de Krings (2001). A continuació es defineix una sèrie de competències que s'agrupen en tres categories principals: competències nuclears, habilitats lingüístiques i competències instrumentals. Finalment el artículo proposa una descripció de l'escenari de traducció controlada propi de la PE de traducció automàtica, sense perdre de vista l'escenari general de qualsevol projecte de traducció, que inclou la descripció del client, el domini del text, la descripció del text, l'ús de glossaris, el motor de traducció automàtica, la qualitat de la traducció automàtica resultant i el propòsit del text traduït.Este artículo aborda las habilidades y las características del perfil del nuevo rol del traductor como poseditor de traducción automática, a la luz del aumento del interés en y del uso de la traducción automática por parte de la industria de la traducción. Después de una breve revisión de la literatura más relevante sobre posedición (PE) en tanto que profesión por sí misma, en el artículo se identifican las diferentes tareas implicadas en los procesos de PE, según la propuesta de Krings (2001). A continuación se define una serie de competencias que se agrupan en tres categorías principales: competencias nucleares, habilidades lingüísticas y competencias instrumentales. Finalmente el artículo propone una descripción del escenario de traducción controlada propio de la PE de traducción automática, sin perder de vista el marco general de cualquier proyecto de traducción, que incluye la descripción del cliente, el dominio del texto, la descripción del texto, el uso de glosarios, el motor de traducción automática, la calidad de la traducción automática resultante y el propósito del texto traducido

    A detailed analysis of phrase-based and syntax-based machine translation: the search for systematic differences

    Get PDF
    This paper describes a range of automatic and manual comparisons of phrase-based and syntax-based statistical machine translation methods applied to English-German and English-French translation of user-generated content. The syntax-based methods underperform the phrase-based models and the relaxation of syntactic constraints to broaden translation rule coverage means that these models do not necessarily generate output which is more grammatical than the output produced by the phrase-based models. Although the systems generate different output and can potentially be fruitfully combined, the lack of systematic difference between these models makes the combination task more challenging

    Improving the translation environment for professional translators

    Get PDF
    When using computer-aided translation systems in a typical, professional translation workflow, there are several stages at which there is room for improvement. The SCATE (Smart Computer-Aided Translation Environment) project investigated several of these aspects, both from a human-computer interaction point of view, as well as from a purely technological side. This paper describes the SCATE research with respect to improved fuzzy matching, parallel treebanks, the integration of translation memories with machine translation, quality estimation, terminology extraction from comparable texts, the use of speech recognition in the translation process, and human computer interaction and interface design for the professional translation environment. For each of these topics, we describe the experiments we performed and the conclusions drawn, providing an overview of the highlights of the entire SCATE project

    Improving the post-editing experience using translation recommendation: a user study

    Get PDF
    We report findings from a user study with professional post-editors using a translation recommendation framework (He et al., 2010) to integrate Statistical Machine Translation (SMT) output with Translation Memory (TM) systems. The framework recommends SMT outputs to a TM user when it predicts that SMT outputs are more suitable for post-editing than the hits provided by the TM. We analyze the effectiveness of the model as well as the reaction of potential users. Based on the performance statistics and the users’comments, we find that translation recommendation can reduce the workload of professional post-editors and improve the acceptance of MT in the localization industry

    Machine translation evaluation resources and methods: a survey

    Get PDF
    We introduce the Machine Translation (MT) evaluation survey that contains both manual and automatic evaluation methods. The traditional human evaluation criteria mainly include the intelligibility, fidelity, fluency, adequacy, comprehension, and informativeness. The advanced human assessments include task-oriented measures, post-editing, segment ranking, and extended criteriea, etc. We classify the automatic evaluation methods into two categories, including lexical similarity scenario and linguistic features application. The lexical similarity methods contain edit distance, precision, recall, F-measure, and word order. The linguistic features can be divided into syntactic features and semantic features respectively. The syntactic features include part of speech tag, phrase types and sentence structures, and the semantic features include named entity, synonyms, textual entailment, paraphrase, semantic roles, and language models. The deep learning models for evaluation are very newly proposed. Subsequently, we also introduce the evaluation methods for MT evaluation including different correlation scores, and the recent quality estimation (QE) tasks for MT. This paper differs from the existing works\cite {GALEprogram2009, EuroMatrixProject2007} from several aspects, by introducing some recent development of MT evaluation measures, the different classifications from manual to automatic evaluation measures, the introduction of recent QE tasks of MT, and the concise construction of the content
    corecore