12 research outputs found

    Results of the WMT16 Tuning Shared Task

    Get PDF
    This paper presents the results of the WMT16 Tuning Shared Task. We provided the participants of this task with a complete machine translation system and asked them to tune its internal parameters (feature weights). The tuned systems were used to translate the test set and the outputs were manually ranked for translation quality. We received 4 submissions in the Czech-English and 8 in the English-Czech translation direction. In addition, we ran 2 baseline setups, tuning the parameters with standard optimizers for BLEU score. In contrast to previous years, the tuned systems in 2016 rely on large data

    Results of the WMT15 Tuning Shared Task

    Get PDF
    This paper presents the results of the WMT15 Tuning Shared Task. We provided the participants of this task with a complete machine translation system and asked them to tune its internal parameters (feature weights). The tuned systems were used to translate the test set and the outputs were manually ranked for translation quality. We received 4 submissions in the English-Czech and 6 in the Czech-English translation direction. In addition, we ran 3 baseline setups, tuning the parameters with standard optimizers for BLEU score

    Machine translation evaluation resources and methods: a survey

    Get PDF
    We introduce the Machine Translation (MT) evaluation survey that contains both manual and automatic evaluation methods. The traditional human evaluation criteria mainly include the intelligibility, fidelity, fluency, adequacy, comprehension, and informativeness. The advanced human assessments include task-oriented measures, post-editing, segment ranking, and extended criteriea, etc. We classify the automatic evaluation methods into two categories, including lexical similarity scenario and linguistic features application. The lexical similarity methods contain edit distance, precision, recall, F-measure, and word order. The linguistic features can be divided into syntactic features and semantic features respectively. The syntactic features include part of speech tag, phrase types and sentence structures, and the semantic features include named entity, synonyms, textual entailment, paraphrase, semantic roles, and language models. The deep learning models for evaluation are very newly proposed. Subsequently, we also introduce the evaluation methods for MT evaluation including different correlation scores, and the recent quality estimation (QE) tasks for MT. This paper differs from the existing works\cite {GALEprogram2009, EuroMatrixProject2007} from several aspects, by introducing some recent development of MT evaluation measures, the different classifications from manual to automatic evaluation measures, the introduction of recent QE tasks of MT, and the concise construction of the content

    Quality expectations of machine translation

    Get PDF
    Machine Translation (MT) is being deployed for a range of use-cases by millions of people on a daily basis. There should, therefore, be no doubt as to the utility of MT. However, not everyone is convinced that MT can be useful, especially as a productivity enhancer for human translators. In this chapter, I address this issue, describing how MT is currently deployed, how its output is evaluated and how this could be enhanced, especially as MT quality itself improves. Central to these issues is the acceptance that there is no longer a single ‘gold standard’ measure of quality, such that the situation in which MT is deployed needs to be borne in mind, especially with respect to the expected ‘shelf-life’ of the translation itself

    Adjunction in hierarchical phrase-based translation

    Get PDF

    Machine translation for institutional academic texts: Output quality, terminology translation and post-editor trust

    Get PDF
    The present work is a feasibility study on the application of Machine Translation (MT) to institutional academic texts, specifically course catalogues, for Italian-English and German-English. The first research question of this work focuses on the feasibility of profitably applying MT to such texts. Since the benefits of a good quality MT might be counteracted by preconceptions of translators towards the output, the second research question examines translator trainees' trust towards an MT output as compared to a human translation (HT). Training and test sets are created for both language combinations in the institutional academic domain. MT systems used are ModernMT and Google Translate. Overall evaluations of the output quality are carried out using automatic metrics. Results show that applying neural MT to institutional academic texts can be beneficial even when bilingual data are not available. When small amounts of sentence pairs become available, MT quality improves. Then, a gold standard data set with manual annotations of terminology (MAGMATic) is created and used for an evaluation of the output focused on terminology translation. The gold standard was publicly released to stimulate research on terminology assessment. The assessment proves that domain-adaptation improves the quality of term translation. To conclude, a method to measure trust in a post-editing task is proposed and results regarding translator trainees trust towards MT are outlined. All participants are asked to work on the same text. Half of them is told that it is an MT output to be post-edited, and the other half that it is a HT needing revision. Results prove that there is no statistically significant difference between post-editing and HT revision in terms of number of edits and temporal effort. Results thus suggest that a new generation of translators that received training on MT and post-editing is not influenced by preconceptions against MT
    corecore