11 research outputs found

    Integrating Online and Active Learning in a Computer-Assisted Translation Workbench

    Get PDF
    This paper describes a pilot study with a computed-assisted translation workbench aiming at testing the integration of online and active learning features. We investigate the effect of these features on translation productivity, using interactive translation prediction (ITP) as a baseline. User activity data were collected from five beta testers using key-logging and eye-tracking. User feedback was also collected at the end of the experiments in the form of retrospective think-aloud protocols. We found that OL performs better than ITP, especially in terms of trans- lation speed. In addition, AL provides better translation quality than ITP for the same levels of user effort. We plan to incorporate these features in the final version of the workbench

    Online Neural Automatic Post-editing for Neural Machine Translation

    Get PDF
    Machine learning from user corrections is key to the industrial deployment of machine translation (MT). We introduce the first on-line approach to automatic post-editing (APE), i.e. the task of automatically correcting MT errors. We present experimental results of APE on English-Italian MT by simulating human post-edits with human reference translations, and by applying online APE on MT outputs of increasing quality. By evaluating APE on generic vs. specialised and static vs. adaptive neural MT, we address the question: At what cost on the MT side will APE become useless?L’apprendimento automatico dalle correzioni degli utenti è fondamentale per lo sviluppo industriale della traduzione automatica (MT). In questo lavoro, introduciamo il primo approccio on-line al post-editing automatico (APE), ovvero il compito di correggere automaticamente gli errori della MT. Presentiamo risultati di online APE su MT da inglese a italiano simulando le correzioni umane con traduzioni manuali già disponibili e utilizzando MT di qualità crescente. Valutando l’APE su MT neurale generica oppure specializzata, statica o adattiva, affrontiamo la domanda di fondo: a fronte di quale costo sul lato MT l’APE diventerà inutile

    Online Neural Automatic Post-editing for Neural Machine Translation

    Get PDF
    Machine learning from user corrections is key to the industrial deployment of machine translation (MT). We introduce the first on-line approach to automatic post-editing (APE), i.e. the task of automatically correcting MT errors. We present experimental results of APE onEnglish-Italian MT by simulating human post-edits with human reference translations, and by applying online APE on MToutputs of increasing quality. By evaluating APE on generic vs. specialised and static vs. adaptive neural MT, we address the question: At what cost on the MT side will APE become useless

    The New THOT Toolkit for Fully-Automatic and Interactive Statistical Machine Translation

    Full text link
    [EN] We present the new THOT toolkit for fullyautomatic and interactive statistical machine translation (SMT). Initial public versions of THOT date back to 2005 and did only include estimation of phrase-based models. By contrast, the new version offers several new features that had not been previously incorporated. The key innovations provided by the toolkit are computeraided translation, including post-editing and interactive SMT, incremental learning and robust generation of alignments at phrase level. In addition to this, the toolkit also provides standard SMT features such as fully-automatic translation, scalable and parallel algorithms for model training, client-server implementation of the translation functionality, etc. The toolkit can be compiled in Unix-like and Windows platforms and it is released under the GNU Lesser General Public License (LGPL).Work supported by the European Union 7 th Framework Program (FP7/2007-2013) under the CasMaCat project (grant agreement no 287576), by Spanish MICINN under grant TIN2012-31723, and by the Generalitat Valenciana under grant ALMPR (Prometeo/2009/014).Ortiz-Martínez, D.; Casacuberta Nolla, F. (2014). The New THOT Toolkit for Fully-Automatic and Interactive Statistical Machine Translation. The Association for Computational Linguistics. 45-48. http://hdl.handle.net/10251/202002454

    A Machine-Aided Approach to Generating Grammar Rules from Japanese Source Text for Use in Hybrid and Rule-based Machine Translation Systems

    Get PDF
    Many automatic machine translation systems available today use a hybrid of pure statistical translation and rule-based grammatical translations. This is largely due to the shortcomings of each individual approach, requiring a large amount of time for linguistics experts to hand-code grammar rules for a rule-based system and requiring large amounts of source text to generate accurate statistical models. By automating a portion of the rule generation process, the creation of grammar rules could be made to be faster, more efficient and less costly. By doing statistical analysis on a bilingual corpus, common grammar rules can be inferred and exported to a hybrid system. The resulting rules then provide a base grammar for the system. This helps to reduce the time needed for experts to hand-code grammar rules and make a hybrid system more effective

    Continuous spaces in statistical machine Translation

    Full text link
    [EN] Classically, statistical machine translation relied on representations of words in a discrete space. Words and phrases were atomically represented as indices in a vector. In the last years, techniques for representing words and phrases in a continuous space have arisen. In this scenario, a word is represented in the continuous space as a real-valued, dense and low-dimensional vector. Statistical models can profit from this richer representation, since it is able to naturally take into account concepts such as semantic or syntactic relationships between words and phrases. This approach is encouraging, but it also entails new challenges. In this work, a language model which relies on continuous representations of words is developed. Such model makes use of a bidirectional recurrent neural network, which is able to take into account both the past and the future context of words. Since the model is costly to train, the training dataset is reduced by using bilingual sentence selection techniques. Two selection methods are used and compared. The language model is then used to rerank translation hypotheses. Results show improvements on the translation quality. Moreover, a new approach for machine translation has been recently proposed: The so-called neural machine translation. It consists in the sole use of a large neural network for carrying out the translation process. In this work, such novel model is compared to the existing phrase-based approaches of statistical machine translation. Finally, the neural translation models are combined with diverse machine translation systems, in order to provide a consensus translation, which aim to improve the translation given by each single system.[ES] Los sistemas clásicos de traducción automática estadística están basados en representaciones de palabras en un espacio discreto. Palabras y segmentos se representan como índices en un vector. Durante los últimos años han surgido técnicas para realizar la representación de palabras y segmentos en un espacio continuo. En este escenario, una palabra se representa en el espacio continuo como un vector de valores reales, denso y de baja dimensión. Los modelos estadísticos pueden aprovecharse de esta representación más rica, puesto que incluye de forma natural conceptos semánticos o relaciones sintácticas entre palabras y segmentos. Esta aproximación es prometedora, pero también conlleva nuevos retos. En este trabajo se desarrolla un modelo de lenguaje basado en representaciones continuas de palabras. Dicho modelo emplea una red neuronal recurrente bidireccional, la cual es capaz de considerar tanto el contexto pasado como el contexto futuro de las palabras. Debido a que este modelo es costoso de entrenar, se emplea un conjunto de entrenamiento reducido mediante técnicas de selección de frases bilingües. Se emplean y comparan dos métodos de selección. Una vez entrenado, el modelo se emplea para reordenar hipótesis de traducción. Los resultados muestran mejoras en la calidad de la traducción. Por otro lado, recientemente se propuso una nueva aproximación a la traducción automática: la llamada traducción automática neuronal. Consiste en el uso exclusivo de una gran red neuronal para llevar a cabo el proceso de traducción. En este trabajo, este nuevo modelo se compara al paradigma actual de traducción basada en segmentos. Finalmente, los modelos de traducción neuronales son combinados con otros sistemas de traducción automática, para ofrecer una traducción consensuada, que busca mejorar las traducciones individuales que cada sistema ofrecePeris Abril, Á. (2015). Continuous spaces in statistical machine Translation. http://hdl.handle.net/10251/68448Archivo delegad

    Interactive translation prediction versus conventional post-editing in practice: a study with the CasMaCat workbench

    Full text link
    [EN] We conducted a field trial in computer-assisted professional translation to compare interactive translation prediction (ITP) against conventional post-editing (PE) of machine translation (MT) output. In contrast to the conventional PE set-up, where an MT system first produces a static translation hypothesis that is then edited by a professional (hence "post-editing"), ITP constantly updates the translation hypothesis in real time in response to user edits. Our study involved nine professional translators and four reviewers working with the web-based CasMaCat workbench. Various new interactive features aiming to assist the post-editor/translator were also tested in this trial. Our results show that even with little training, ITP can be as productive as conventional PE in terms of the total time required to produce the final translation. Moreover, translation editors working with ITP require fewer key strokes to arrive at the final version of their translation.This work was supported by the European Union’s 7th Framework Programme (FP7/2007–2013) under grant agreement No 287576 (CasMaCat ).Sanchis Trilles, G.; Alabau, V.; Buck, C.; Carl, M.; Casacuberta Nolla, F.; Garcia Martinez, MM.; Germann, U.... (2014). Interactive translation prediction versus conventional post-editing in practice: a study with the CasMaCat workbench. Machine Translation. 28(3-4):217-235. https://doi.org/10.1007/s10590-014-9157-9S217235283-4Alabau V, Leiva LA, Ortiz-Martínez D, Casacuberta F (2012) User evaluation of interactive machine translation systems. In: Proceedings of the 16th Annual Conference of the European Association for Machine Translation, pp 20–23Alabau V, Buck C, Carl M, Casacuberta F, García-Martínez M, Germann U, González-Rubio J, Hill R, Koehn P, Leiva L, Mesa-Lao B, Ortiz-Martínez D, Saint-Amand H, Sanchis-Trilles G, Tsoukala C (2014) Casmacat: A computer-assisted translation workbench. In: Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics, pp 25–28Alves F, Vale D (2009) Probing the unit of translation in time: aspects of the design and development of a web application for storing, annotating, and querying translation process data. Across Lang Cultures 10(2):251–273Bach N, Huang F, Al-Onaizan Y (2011) Goodness: A method for measuring machine translation confidence. In: Proceedings of the Annual Meeting of the Association for Computational Linguistics, pp 211–219Barrachina S, Bender O, Casacuberta F, Civera J, Cubel E, Khadivi S, Lagarda AL, Ney H, Tomás J, Vidal E, Vilar JM (2009) Statistical approaches to computer-assisted translation. Comput Linguist 35(1):3–28Brown PF, Della Pietra SA, Della Pietra VJ (1993) The mathematics of statistical machine translation: parameter estimation. Comput Linguist 19(2):263–311Callison-Burch C, Koehn P, Monz C, Post M, Soricut R, Specia L (2012) Findings of the 2012 workshop on statistical machine translation. In: Proceedings of the Seventh Workshop on Statistical Machine Translation, pp 10–51Carl M (2012a) The CRITT TPR-DB 1.0: A database for empirical human translation process research. In: Proceedings of the AMTA 2012 Workshop on Post-Editing Technology and Practice, pp 1–10Carl M (2012b) Translog-II: a program for recording user activity data for empirical reading and writing research. In: Proceedings of the Eighth International Conference on Language Resources and Evaluation, pp 4108–4112Carl M (2014) Produkt- und Prozesseinheiten in der CRITT Translation Process Research Database. In: Ahrens B (ed) Translationswissenschaftliches Kolloquium III: Beiträge zur Übersetzungs- und Dolmetschwissenschaft (Köln/Germersheim). Peter Lang, Frankfurt am Main, pp 247–266Carl M, Kay M (2011) Gazing and typing activities during translation : a comparative study of translation units of professional and student translators. Meta 56(4):952–975Doherty S, O’Brien S, Carl M (2010) Eye tracking as an MT evaluation technique. Mach Transl 24(1):1–13Elming J, Carl M, Balling LW (2014) Investigating user behaviour in post-editing and translation using the Casmacat workbench. In: O’Brien S, Winther Balling L, Carl M, Simard M, Specia L (eds) Post-editing of machine translation: processes and applications. Cambridge Scholar Publishing, Newcastle upon Tyne, pp 147–169Federico M, Cattelan A, Trombetti M (2012) Measuring user productivity in machine translation enhanced computer assisted translation. In: Proceedings of the Tenth Biennial Conference of the Association for Machine Translation in the AmericasFlournoy R, Duran C (2009) Machine translation and document localization at adobe: From pilot to production. In: Proceedings of MT Summit XIIGreen S, Heer J, Manning CD (2013) The efficacy of human post-editing for language translation. In: Proceedings of SIGCHI Conference on Human Factors in Computing Systems, pp 439–448Guerberof A (2009) Productivity and quality in mt post-editing. In: Proceedings of MT Summit XII-Workshop: Beyond Translation Memories: New Tools for Translators MTGuerberof A (2012) Productivity and quality in the post-editing of outputs from translation memories and machine translation. Ph.D. ThesisJust MA, Carpenter PA (1980) A theory of reading: from eye fixations to comprehension. Psychol Rev 87(4):329Koehn P (2009a) A process study of computer-aided translation. Mach Transl 23(4):241–263Koehn P (2009b) A web-based interactive computer aided translation tool. In: Proceedings of ACL-IJCNLP 2009 Software Demonstrations, pp 17–20Krings HP (2001) Repairing texts: empirical investigations of machine translation post-editing processes, vol 5. Kent State University Press, KentLacruz I, Shreve GM, Angelone E (2012) Average pause ratio as an indicator of cognitive effort in post-editing: a case study. In: Proceedings of the AMTA 2012 Workshop on Post-Editing Technology and Practice, pp 21–30Langlais P, Foster G, Lapalme G (2000) Transtype: A computer-aided translation typing system. In: Proceedings of the 2000 NAACL-ANLP Workshop on Embedded Machine Translation Systems, pp 46–51Leiva LA, Alabau V, Vidal E (2013) Error-proof, high-performance, and context-aware gestures for interactive text edition. In: Proceedings of the 2013 annual conference extended abstracts on Human factors in computing systems, pp 1227–1232Montgomery D (2004) Introduction to statistical quality control. Wiley, HobokenO’Brien S (2009) Eye tracking in translation process research: methodological challenges and solutions, Copenhagen Studies in Language, vol 38. Samfundslitteratur, Copenhagen, pp 251–266Ortiz-Martínez D, Casacuberta F (2014) The new Thot toolkit for fully automatic and interactive statistical machine translation. In: Proceedings of the 14th Annual Meeting of the European Association for Computational Linguistics: System Demonstrations, pp 45–48Plitt M, Masselot F (2010) A productivity test of statistical machine translation post-editing in a typical localisation context. Prague Bulletin Math Linguist 93(1):7–16Sanchis-Trilles G, Ortiz-Martínez D, Civera J, Casacuberta F, Vidal E, Hoang H (2008) Improving interactive machine translation via mouse actions. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp 485–494Simard M, Foster G (2013) Pepr: Post-edit propagation using phrase-based statistical machine translation. In: Proceedings of MT Summit XIV, pp 191–198Skadiņš R, Puriņš M, Skadiņa I, Vasiļjevs A (2011) Evaluation of SMT in localization to under-resourced inflected language. In: Proceedings of the 15th International Conference of the European Association for Machine Translation, pp 35–4

    Proceedings of the Fifth Italian Conference on Computational Linguistics CLiC-it 2018

    Get PDF
    On behalf of the Program Committee, a very warm welcome to the Fifth Italian Conference on Computational Linguistics (CLiC-­‐it 2018). This edition of the conference is held in Torino. The conference is locally organised by the University of Torino and hosted into its prestigious main lecture hall “Cavallerizza Reale”. The CLiC-­‐it conference series is an initiative of the Italian Association for Computational Linguistics (AILC) which, after five years of activity, has clearly established itself as the premier national forum for research and development in the fields of Computational Linguistics and Natural Language Processing, where leading researchers and practitioners from academia and industry meet to share their research results, experiences, and challenges
    corecore