Search CORE

13 research outputs found

Guiding Neural Machine Translation Decoding with External Knowledge

Author: Frédéric Blain
Lucia Specia
Marcello Federico
Marco Turchi
Matteo Negri
Rajen Chatterjee
Publication venue: The Association for Computational Linguistics
Publication date: 01/01/2017
Field of study

Differentlyfromthephrase-based paradigm,neural machine translation(NMT) operates on word and sentence representations in a continuous space.This makes the decoding process not only more difficult to interpret, but also harder to influence with external knowledge. For the latter problem, effective solutions like the XML-markup used by phrase-based models to inject fixed translation options as constraints at decoding time are not yet available. We propose a “guide”mechanism that enhances an existingNMT decoder with the ability to prioritize and adequately handle translation options presented in the form of XML annotations of source words. Positive results obtained in two different translation tasks indicate the effectiveness of our approach

Archivio della ricerca - Fondazione Bruno Kessler

Guiding neural machine translation decoding with external knowledge

Author: Blain Frédéric
Chatterjee Rajen
Federico Marcello
Negri Matteo
Specia Lucia
Turchi Marco
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 25/08/2020
Field of study

© 2017 The Authors. Published by Association for Computational Linguistics. This is an open access article available under a Creative Commons licence. The published version can be accessed at the following link on the publisher’s website: http://dx.doi.org/10.18653/v1/W17-4716Chatterjee, R., Negri, M., Turchi, M., Federico, M. et al. (2017) Guiding neural machine translation decoding with external knowledge. In, Proceedings of the Second Conference on Machine Translation, Volume 1: Research Papers, Bojar, O., Buck, C., Chatterjee, R., Federmann, C. et al. (eds.) Stroudsburg, PA: Association for Computational Linguistics, pp. 157-168.This work has been partially supported by the ECfunded H2020 projects QT21 (grant agreement no. 645452) and ModernMT (grant agreement no. 645487)

Wolverhampton Intellectual Repository and E-theses

eSCAPE: a Large-scale Synthetic Corpus for Automatic Post-Editing

Author: Bertoldi Nicola
Chatterjee Rajen
Negri Matteo
Turchi Marco
Publication venue
Publication date: 20/03/2018
Field of study

Training models for the automatic correction of machine-translated text usually relies on data consisting of (source, MT, human post- edit) triplets providing, for each source sentence, examples of translation errors with the corresponding corrections made by a human post-editor. Ideally, a large amount of data of this kind should allow the model to learn reliable correction patterns and effectively apply them at test stage on unseen (source, MT) pairs. In practice, however, their limited availability calls for solutions that also integrate in the training process other sources of knowledge. Along this direction, state-of-the-art results have been recently achieved by systems that, in addition to a limited amount of available training data, exploit artificial corpora that approximate elements of the "gold" training instances with automatic translations. Following this idea, we present eSCAPE, the largest freely-available Synthetic Corpus for Automatic Post-Editing released so far. eSCAPE consists of millions of entries in which the MT element of the training triplets has been obtained by translating the source side of publicly-available parallel corpora, and using the target side as an artificial human post-edit. Translations are obtained both with phrase-based and neural models. For each MT paradigm, eSCAPE contains 7.2 million triplets for English-German and 3.3 millions for English-Italian, resulting in a total of 14,4 and 6,6 million instances respectively. The usefulness of eSCAPE is proved through experiments in a general-domain scenario, the most challenging one for automatic post-editing. For both language directions, the models trained on our artificial data always improve MT quality with statistically significant gains. The current version of eSCAPE can be freely downloaded from: http://hltshare.fbk.eu/QT21/eSCAPE.html.Comment: Accepted at LREC 201

arXiv.org e-Print Archive

Archivio della ricerca - Fondazione Bruno Kessler

Témaspecifikus gépi fordítórendszer minőségének javítása domain adaptáció segítségével

Author: Laki László János
Publication venue
Publication date: 01/01/2019
Field of study

A mély tanulásos módszerek elterjedése napjainkban nagymértékben megváltoztatta a gépi fordítások emberi megítélését. A statisztikai gépi fordítórendszerekkel (SMT) szemben a neurálishálózat-alapon működő architektúrák (NMT) sokkal olvashatóbb fordításokat generálnak, melyek a hivatásos fordítók számára könnyebben és hatékonyabban javíthatók az utófeldolgozás során. Az új módszer nehézsége azonban, hogy a stabilan jó fodítási minőséget adó rendszerek tanításához nagy méretű tanítóanyagra van szükség. Ez azonban a legtöbb fordítócég vagy nyelvpár esetén nem áll rendelkezésre. Munkám során a kicsi és jó minőségű in-domain tanítóanyagokat adatszelekció segítségével feldúsítottam egy nagy méretű out-of-domain korpusz leginkább hasonló szegmenseivel. Az így létrehozott architektúrával sikerült statisztikailag szignifikáns mértékben javítanom a fordítórendszer minőségét az összes vizsgált esetben. Kutatásom során igyekeztem megtalálni a feladathoz leginkább alkalmas szelekciós módszert, illetve megvizsgáltam a rendszer működését több különböző nyelv- és domainpár kombinációval

University of Szeged

Knowledge Graphs Effectiveness in Neural Machine Translation Improvement

Author: Ahmadnia Benyamin
Dorr Bonnie J.
Kordjamshidi Parisa
Publication venue: 'AGHU University of Science and Technology Press'
Publication date: 01/01/2020
Field of study

Neural Machine Translation (NMT) systems require a massive amount of Maintaining semantic relations between words during the translation process yields more accurate target-language output from Neural Machine Translation (NMT). Although difficult to achieve from training data alone, it is possible to leverage Knowledge Graphs (KGs) to retain source-language semantic relations in the corresponding target-language translation. The core idea is to use KG entity relations as embedding constraints to improve the mapping from source to target. This paper describes two embedding constraints, both of which employ Entity Linking (EL)---assigning a unique identity to entities---to associate words in training sentences with those in the KG: (1) a monolingual embedding constraint that supports an enhanced semantic representation of the source words through access to relations between entities in a KG; and (2) a bilingual embedding constraint that forces entity relations in the source-language to be carried over to the corresponding entities in the target-language translation. The method is evaluated for English-Spanish translation exploiting Freebase as a source of knowledge. Our experimental results show that exploiting KG information not only decreases the number of unknown words in the translation but also improves translation quality

AGH (Akademia Górniczo-Hutnicza) University of Science and Technology: Journals

Computer Science Journal (AGH University of Science and Technology, Krakow)

Biblioteka Nauki - repozytorium artykuÅÃ³w

eSCAPE: a Large-scale Synthetic Corpus for Automatic Post-Editing

Author: Bertoldi Nicola
Chatterjee Rajen
Negri Matteo
Turchi Marco
Publication venue: European Language Resources Association (ELRA)
Publication date
Field of study

Training models for the automatic correction of machine-translated text usually relies on data consisting of (source, MT, human_post-edit) triplets providing, for each source sentence, examples of translation errors with the corresponding corrections made by a human post-editor. Ideally, a large amount of data of this kind should allow the model to learn reliable correction patterns and effectively apply them at test stage on unseen (source,MT) pairs. In practice, however, their limited availability calls for solutions that also integrate in the training process other sources of knowledge. Along this direction, state-of-the-art results have been recently achieved by systems that, in addition to a limited amount of available training data, exploit artificial corpora that approximate elements of the “gold” training instances with automatic translations. Following this idea, we present eSCAPE, the largest freely-available Synthetic Corpus for AutomaticPost-Editing released so far. eSCAPE consists of millions of entries in which the MT element of the training triplets has been obtained by translating the sourceside of publicly-available parallel corpora, and using the target side as an artificial human post-edit.Translations are obtained both with phrase-based and neural models. For each MT paradigm, eSCAPE contains 7.2 million triplets forEnglish–German and 3.3 millions for English–Italian, resulting in a total of 14,4 and 6,6 million instances respectively. The usefulness of eSCAPE is proved through experiments in a general-domain scenario, the most challenging one for automatic post-editing. For both language directions, the models trained on our artificial data always improve MT quality with statistically significant gains. The current version of eSCAPE can be freely downloaded from: http://hltshare.fbk.eu/QT21/eSCAPE.htm

Archivio della ricerca - Fondazione Bruno Kessler