42,579 research outputs found

    CUNI System for WMT16 Automatic Post-Editing and Multimodal Translation Tasks

    Get PDF
    Neural sequence to sequence learning recently became a very promising paradigm in machine translation, achieving competitive results with statistical phrase-based systems. In this system description paper, we attempt to utilize several recently published methods used for neural sequential learning in order to build systems for WMT 2016 shared tasks of Automatic Post-Editing and Multimodal Machine Translation.Comment: Accepted to the First Conference of Machine Translation (WMT16

    A Comparative Quality Evaluation of PBSMT and NMT using Professional Translators

    Get PDF
    This paper reports on a comparative evaluation of phrase-based statistical machine translation (PBSMT) and neural machine translation (NMT) for four language pairs, using the PET interface to compare educational domain output from both systems using a variety of metrics, including automatic evaluation as well as human rankings of adequacy and fluency, error-type markup, and post-editing (technical and temporal) effort, performed by professional translators. Our results show a preference for NMT in side-by-side ranking for all language pairs, texts, and segment lengths. In addition, perceived fluency is improved and annotated errors are fewer in the NMT output. Results are mixed for perceived adequacy and for errors of omission, addition, and mistranslation. Despite far fewer segments requiring post-editing, document-level post-editing performance was not found to have significantly improved in NMT compared to PBSMT. This evaluation was conducted as part of the TraMOOC project, which aims to create a replicable semi-automated methodology for high-quality machine translation of educational data

    Cache-based Online Adaptation for Machine Translation Enhanced Computer Assisted Translation

    Get PDF
    The integration of machine translation in the human translation work flow rises intriguing and challenging research issues. One of them, addressed in this work, is how to dynamically adapt phrase-based statistical MT from user post-editing. By casting the problem in the online machine learning paradigm, we propose a cache-based adaptation technique method that dynamically stores target n-gram and phrase-pair features used by the translator. For the sake of adaptation, during decoding not only recency of the features stored in the cache is rewarded but also their occurrence in similar already translated sentences in the document. Our experimental results show the effectiveness of the devised method both on standard benchmarks and on documents post-edited by professional translators through the real use of the MateCat tool

    Generating titles for millions of browse pages on an e-Commerce site

    Get PDF
    We present two approaches to generate titles for browse pages in five different languages, namely English, German, French, Italian and Spanish. These browse pages are structured search pages in an e-commerce domain. We first present a rule-based approach to generate these browse page titles. In addition, we also present a hybrid approach which uses a phrase-based statistical machine translation engine on top of the rule-based system to assemble the best title. For the two languages English and German we have access to a large amount of already available rule-based generated and curated titles. For these languages we present an automatic post-editing approach which learns how to post-edit the rule-based titles into curated titles

    Interactive translation prediction versus conventional post-editing in practice: a study with the CasMaCat workbench

    Full text link
    [EN] We conducted a field trial in computer-assisted professional translation to compare interactive translation prediction (ITP) against conventional post-editing (PE) of machine translation (MT) output. In contrast to the conventional PE set-up, where an MT system first produces a static translation hypothesis that is then edited by a professional (hence "post-editing"), ITP constantly updates the translation hypothesis in real time in response to user edits. Our study involved nine professional translators and four reviewers working with the web-based CasMaCat workbench. Various new interactive features aiming to assist the post-editor/translator were also tested in this trial. Our results show that even with little training, ITP can be as productive as conventional PE in terms of the total time required to produce the final translation. Moreover, translation editors working with ITP require fewer key strokes to arrive at the final version of their translation.This work was supported by the European Union’s 7th Framework Programme (FP7/2007–2013) under grant agreement No 287576 (CasMaCat ).Sanchis Trilles, G.; Alabau, V.; Buck, C.; Carl, M.; Casacuberta Nolla, F.; Garcia Martinez, MM.; Germann, U.... (2014). Interactive translation prediction versus conventional post-editing in practice: a study with the CasMaCat workbench. Machine Translation. 28(3-4):217-235. https://doi.org/10.1007/s10590-014-9157-9S217235283-4Alabau V, Leiva LA, Ortiz-Martínez D, Casacuberta F (2012) User evaluation of interactive machine translation systems. In: Proceedings of the 16th Annual Conference of the European Association for Machine Translation, pp 20–23Alabau V, Buck C, Carl M, Casacuberta F, García-Martínez M, Germann U, González-Rubio J, Hill R, Koehn P, Leiva L, Mesa-Lao B, Ortiz-Martínez D, Saint-Amand H, Sanchis-Trilles G, Tsoukala C (2014) Casmacat: A computer-assisted translation workbench. In: Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics, pp 25–28Alves F, Vale D (2009) Probing the unit of translation in time: aspects of the design and development of a web application for storing, annotating, and querying translation process data. Across Lang Cultures 10(2):251–273Bach N, Huang F, Al-Onaizan Y (2011) Goodness: A method for measuring machine translation confidence. In: Proceedings of the Annual Meeting of the Association for Computational Linguistics, pp 211–219Barrachina S, Bender O, Casacuberta F, Civera J, Cubel E, Khadivi S, Lagarda AL, Ney H, Tomás J, Vidal E, Vilar JM (2009) Statistical approaches to computer-assisted translation. Comput Linguist 35(1):3–28Brown PF, Della Pietra SA, Della Pietra VJ (1993) The mathematics of statistical machine translation: parameter estimation. Comput Linguist 19(2):263–311Callison-Burch C, Koehn P, Monz C, Post M, Soricut R, Specia L (2012) Findings of the 2012 workshop on statistical machine translation. In: Proceedings of the Seventh Workshop on Statistical Machine Translation, pp 10–51Carl M (2012a) The CRITT TPR-DB 1.0: A database for empirical human translation process research. In: Proceedings of the AMTA 2012 Workshop on Post-Editing Technology and Practice, pp 1–10Carl M (2012b) Translog-II: a program for recording user activity data for empirical reading and writing research. In: Proceedings of the Eighth International Conference on Language Resources and Evaluation, pp 4108–4112Carl M (2014) Produkt- und Prozesseinheiten in der CRITT Translation Process Research Database. In: Ahrens B (ed) Translationswissenschaftliches Kolloquium III: Beiträge zur Übersetzungs- und Dolmetschwissenschaft (Köln/Germersheim). Peter Lang, Frankfurt am Main, pp 247–266Carl M, Kay M (2011) Gazing and typing activities during translation : a comparative study of translation units of professional and student translators. Meta 56(4):952–975Doherty S, O’Brien S, Carl M (2010) Eye tracking as an MT evaluation technique. Mach Transl 24(1):1–13Elming J, Carl M, Balling LW (2014) Investigating user behaviour in post-editing and translation using the Casmacat workbench. In: O’Brien S, Winther Balling L, Carl M, Simard M, Specia L (eds) Post-editing of machine translation: processes and applications. Cambridge Scholar Publishing, Newcastle upon Tyne, pp 147–169Federico M, Cattelan A, Trombetti M (2012) Measuring user productivity in machine translation enhanced computer assisted translation. In: Proceedings of the Tenth Biennial Conference of the Association for Machine Translation in the AmericasFlournoy R, Duran C (2009) Machine translation and document localization at adobe: From pilot to production. In: Proceedings of MT Summit XIIGreen S, Heer J, Manning CD (2013) The efficacy of human post-editing for language translation. In: Proceedings of SIGCHI Conference on Human Factors in Computing Systems, pp 439–448Guerberof A (2009) Productivity and quality in mt post-editing. In: Proceedings of MT Summit XII-Workshop: Beyond Translation Memories: New Tools for Translators MTGuerberof A (2012) Productivity and quality in the post-editing of outputs from translation memories and machine translation. Ph.D. ThesisJust MA, Carpenter PA (1980) A theory of reading: from eye fixations to comprehension. Psychol Rev 87(4):329Koehn P (2009a) A process study of computer-aided translation. Mach Transl 23(4):241–263Koehn P (2009b) A web-based interactive computer aided translation tool. In: Proceedings of ACL-IJCNLP 2009 Software Demonstrations, pp 17–20Krings HP (2001) Repairing texts: empirical investigations of machine translation post-editing processes, vol 5. Kent State University Press, KentLacruz I, Shreve GM, Angelone E (2012) Average pause ratio as an indicator of cognitive effort in post-editing: a case study. In: Proceedings of the AMTA 2012 Workshop on Post-Editing Technology and Practice, pp 21–30Langlais P, Foster G, Lapalme G (2000) Transtype: A computer-aided translation typing system. In: Proceedings of the 2000 NAACL-ANLP Workshop on Embedded Machine Translation Systems, pp 46–51Leiva LA, Alabau V, Vidal E (2013) Error-proof, high-performance, and context-aware gestures for interactive text edition. In: Proceedings of the 2013 annual conference extended abstracts on Human factors in computing systems, pp 1227–1232Montgomery D (2004) Introduction to statistical quality control. Wiley, HobokenO’Brien S (2009) Eye tracking in translation process research: methodological challenges and solutions, Copenhagen Studies in Language, vol 38. Samfundslitteratur, Copenhagen, pp 251–266Ortiz-Martínez D, Casacuberta F (2014) The new Thot toolkit for fully automatic and interactive statistical machine translation. In: Proceedings of the 14th Annual Meeting of the European Association for Computational Linguistics: System Demonstrations, pp 45–48Plitt M, Masselot F (2010) A productivity test of statistical machine translation post-editing in a typical localisation context. Prague Bulletin Math Linguist 93(1):7–16Sanchis-Trilles G, Ortiz-Martínez D, Civera J, Casacuberta F, Vidal E, Hoang H (2008) Improving interactive machine translation via mouse actions. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp 485–494Simard M, Foster G (2013) Pepr: Post-edit propagation using phrase-based statistical machine translation. In: Proceedings of MT Summit XIV, pp 191–198Skadiņš R, Puriņš M, Skadiņa I, Vasiļjevs A (2011) Evaluation of SMT in localization to under-resourced inflected language. In: Proceedings of the 15th International Conference of the European Association for Machine Translation, pp 35–4

    Improving the post-editing experience using translation recommendation: a user study

    Get PDF
    We report findings from a user study with professional post-editors using a translation recommendation framework (He et al., 2010) to integrate Statistical Machine Translation (SMT) output with Translation Memory (TM) systems. The framework recommends SMT outputs to a TM user when it predicts that SMT outputs are more suitable for post-editing than the hits provided by the TM. We analyze the effectiveness of the model as well as the reaction of potential users. Based on the performance statistics and the users’comments, we find that translation recommendation can reduce the workload of professional post-editors and improve the acceptance of MT in the localization industry

    TMX markup: a challenge when adapting SMT to the localisation environment

    Get PDF
    Translation memory (TM) plays an important role in localisation workflows and is used as an efficient and fundamental tool to carry out translation. In recent years, statistical machine translation (SMT) techniques have been rapidly developed, and the translation quality and speed have been significantly improved as well. However,when applying SMT technique to facilitate post-editing in the localisation industry, we need to adapt SMT to the TM data which is formatted with special mark-up. In this paper, we explore some issues when adapting SMT to Symantec formatted TM data. Three different methods are proposed to handle the Translation Memory eXchange (TMX) markup and a comparative study is carried out between them. Furthermore, we also compare the TMX-based SMT systems with a customised SYSTRAN system through human evaluation and automatic evaluation metrics. The experimental results conducted on the French and English language pair show that the SMT can perform well using TMX as input format either during training or at runtime

    Quantifying the effect of machine translation in a high-quality human translation production process

    Get PDF
    This paper studies the impact of machine translation (MT) on the translation workflow at the Directorate-General for Translation (DGT), focusing on two language pairs and two MT paradigms: English-into-French with statistical MT and English-into-Finnish with neural MT. We collected data from 20 professional translators at DGT while they carried out real translation tasks in normal working conditions. The participants enabled/disabled MT for half of the segments in each document. They filled in a survey at the end of the logging period. We measured the productivity gains (or losses) resulting from the use of MT and examined the relationship between technical effort and temporal effort. The results show that while the usage of MT leads to productivity gains on average, this is not the case for all translators. Moreover, the two technical effort indicators used in this study show weak correlations with post-editing time. The translators' perception of their speed gains was more or less in line with the actual results. Reduction of typing effort is the most frequently mentioned reason why participants preferred working with MT, but also the psychological benefits of not having to start from scratch were often mentioned

    Lexically Constrained Decoding for Sequence Generation Using Grid Beam Search

    Full text link
    We present Grid Beam Search (GBS), an algorithm which extends beam search to allow the inclusion of pre-specified lexical constraints. The algorithm can be used with any model that generates a sequence y^={y0…yT} \mathbf{\hat{y}} = \{y_{0}\ldots y_{T}\} , by maximizing p(y∣x)=∏tp(yt∣x;{y0…yt−1}) p(\mathbf{y} | \mathbf{x}) = \prod\limits_{t}p(y_{t} | \mathbf{x}; \{y_{0} \ldots y_{t-1}\}) . Lexical constraints take the form of phrases or words that must be present in the output sequence. This is a very general way to incorporate additional knowledge into a model's output without requiring any modification of the model parameters or training data. We demonstrate the feasibility and flexibility of Lexically Constrained Decoding by conducting experiments on Neural Interactive-Predictive Translation, as well as Domain Adaptation for Neural Machine Translation. Experiments show that GBS can provide large improvements in translation quality in interactive scenarios, and that, even without any user input, GBS can be used to achieve significant gains in performance in domain adaptation scenarios.Comment: Accepted as a long paper at ACL 201
    • …
    corecore