6 research outputs found
SYSTRAN Purely Neural MT Engines for WMT2017
This paper describes SYSTRAN's systems submitted to the WMT 2017 shared news
translation task for English-German, in both translation directions. Our
systems are built using OpenNMT, an open-source neural machine translation
system, implementing sequence-to-sequence models with LSTM encoder/decoders and
attention. We experimented using monolingual data automatically
back-translated. Our resulting models are further hyper-specialised with an
adaptation technique that finely tunes models according to the evaluation test
sentences.Comment: Published in WMT 201
Approches quantitatives de l'analyse des pr{\'e}dictions en traduction automatique neuronale (TAN)
As part of a larger project on optimal learning conditions in neural machine
translation, we investigate characteristic training phases of translation
engines. All our experiments are carried out using OpenNMT-Py: the
pre-processing step is implemented using the Europarl training corpus and the
INTERSECT corpus is used for validation. Longitudinal analyses of training
phases suggest that the progression of translations is not always linear.
Following the results of textometric explorations, we identify the importance
of the phenomena related to chronological progression, in order to map
different processes at work in neural machine translation (NMT).Comment: in French. JADT 2020 : 15{\`e}mes Journ{\'e}es Internationales
d'Analyse statistique des Donn{\'e}es Textuelles, Universit{\'e} de Toulouse,
Jun 2020, Toulouse, Franc
Low-Resource Unsupervised NMT:Diagnosing the Problem and Providing a Linguistically Motivated Solution
Unsupervised Machine Translation hasbeen advancing our ability to translatewithout parallel data, but state-of-the-artmethods assume an abundance of mono-lingual data. This paper investigates thescenario where monolingual data is lim-ited as well, finding that current unsuper-vised methods suffer in performance un-der this stricter setting. We find that theperformance loss originates from the poorquality of the pretrained monolingual em-beddings, and we propose using linguis-tic information in the embedding train-ing scheme. To support this, we look attwo linguistic features that may help im-prove alignment quality: dependency in-formation and sub-word information. Us-ing dependency-based embeddings resultsin a complementary word representationwhich offers a boost in performance ofaround 1.5 BLEU points compared to stan-dardWORD2VECwhen monolingual datais limited to 1 million sentences per lan-guage. We also find that the inclusion ofsub-word information is crucial to improv-ing the quality of the embedding