4 research outputs found

    Embracing the threat: machine translation as a solution for subtitling

    Get PDF
    Recent decades have brought significant changes in the subtitling industry, both in terms of workflow and in the context of the market for audiovisual translation. Machine translation (MT), whilst in regular use in the traditional localisation industry, has not seen a significant uptake in the subtitling arena. The SUMAT project, an EU-funded project which ran from 2011 to 2014 had as its aim the building and evaluation of viable MT solutions for the subtitling industry in nine bidirectional language pairs. As part of the project, a year-long large-scale evaluation of the output of the resulting MT engines was carried out by trained subtitlers. This paper reports on the impetus behind the investigation of MT for subtitling, previous work in this field, and discusses some of the results of this evaluation, in particular an attempt to measure the extent of productivity gain or loss for subtitlers using machine translation as opposed to working in the traditional way. The paper examines opportunities and limitations of MT as a viable option for work of this nature and makes recommendations for the training of subtitle post-editors

    A tolmácsolt interakció testi és térbeli jellemzői a bíróságon

    Get PDF
    Kyseessä on aiemmin saksan kielellä julkaistun artikkelin käännös, jonka on tehnyt Tímea Cziráki.Non peer reviewe

    Robust machine translation for multi-domain tasks

    No full text
    In this thesis, we investigate and extend the phrase-based approach to statistical machine translation. Due to improved concepts and algorithms, the quality of the generated translation hypotheses has been significantly improved in recent years. Still, the translation quality leaves a lot to be desired when going beyond traditional translation tasks, such as newswire articles, and when addressing more ambitious translation problems. We extend the state-of-the-art in phrase-based translation which enables us to build a robust translation system for multi-domain input. Robustness is hereby regarded as the ability to produce high quality translations for arbitrary input texts, e.g. automatic transcriptions of recognized speech or other unstructured, potentially noisy input. In this work, we focus on Arabic-English translation tasks. We study the search problem for phrase-based statistical machine translation in detail. For this, we examine the effect of the different models on the translation quality. Moreover, we make an explicit distinction between reordering (coverage) and lexical hypotheses in the pruning process and stress the importance of the coverage pruning to adjust the balance between hypotheses representing different reorderings (coverage hypotheses) and hypotheses with different lexical representations. We present constraints to solve the reordering problem in machine translation. To trim our translation system for multi-domain input and to improve the robustness built into the decoder, we apply domain adaptation to the language models and rerank the candidate translations using appropriate rescoring models. We also present our work on adjusting the vocabularies of the speech recognizer and the machine translation system in a preprocessing step and on predicting missing punctuation marks for automatically transcribed speech (in the actual translation process). Processing morphologically rich languages such as Arabic generally poses high demands on preprocessing. We show that the choice of the appropriate preprocessing strategy depends on the translation domain and on the structure of the input data. Experimental results emphasize how the proper choice of the preprocessing approach helps to increase the translation quality. In addition, we address the task of improving the translation quality by means of syntactically motivated feature functions within a reranking concept. Then, we investigate different data-driven approaches to the task of transliterating proper names. Often, such names are out-of-vocabulary terms and the intention is to preserve the names by transliteration. Finally, we show how human translators can be assisted by machine translation systems. We compare search strategies for interactive machine translation. The presented machine translation system achieves state-of-the-art performance and has been successfully applied to the large-scale Arabic-English GALE translation evaluations. Furthermore, the system was ranked among the top submissions for the NIST Open Machine Translation Evaluation 2006 and for the series of IWSLT evaluation campaigns

    Robust machine translation for multi-domain tasks

    Get PDF
    In this thesis, we investigate and extend the phrase-based approach to statistical machine translation. Due to improved concepts and algorithms, the quality of the generated translation hypotheses has been significantly improved in recent years. Still, the translation quality leaves a lot to be desired when going beyond traditional translation tasks, such as newswire articles, and when addressing more ambitious translation problems. We extend the state-of-the-art in phrase-based translation which enables us to build a robust translation system for multi-domain input. Robustness is hereby regarded as the ability to produce high quality translations for arbitrary input texts, e.g. automatic transcriptions of recognized speech or other unstructured, potentially noisy input. In this work, we focus on Arabic-English translation tasks. We study the search problem for phrase-based statistical machine translation in detail. For this, we examine the effect of the different models on the translation quality. Moreover, we make an explicit distinction between reordering (coverage) and lexical hypotheses in the pruning process and stress the importance of the coverage pruning to adjust the balance between hypotheses representing different reorderings (coverage hypotheses) and hypotheses with different lexical representations. We present constraints to solve the reordering problem in machine translation. To trim our translation system for multi-domain input and to improve the robustness built into the decoder, we apply domain adaptation to the language models and rerank the candidate translations using appropriate rescoring models. We also present our work on adjusting the vocabularies of the speech recognizer and the machine translation system in a preprocessing step and on predicting missing punctuation marks for automatically transcribed speech (in the actual translation process). Processing morphologically rich languages such as Arabic generally poses high demands on preprocessing. We show that the choice of the appropriate preprocessing strategy depends on the translation domain and on the structure of the input data. Experimental results emphasize how the proper choice of the preprocessing approach helps to increase the translation quality. In addition, we address the task of improving the translation quality by means of syntactically motivated feature functions within a reranking concept. Then, we investigate different data-driven approaches to the task of transliterating proper names. Often, such names are out-of-vocabulary terms and the intention is to preserve the names by transliteration. Finally, we show how human translators can be assisted by machine translation systems. We compare search strategies for interactive machine translation. The presented machine translation system achieves state-of-the-art performance and has been successfully applied to the large-scale Arabic-English GALE translation evaluations. Furthermore, the system was ranked among the top submissions for the NIST Open Machine Translation Evaluation 2006 and for the series of IWSLT evaluation campaigns
    corecore