Generic and Specialized Word Embeddings for Multi-Domain Machine Translation

Crego, Josep-Maria; Pham, Minh Quang; Senellart, Jean; Yvon, François

Generic and Specialized Word Embeddings for Multi-Domain Machine Translation

Authors: Josep-Maria Crego
Minh Quang Pham
Jean Senellart
François Yvon
Publication date: 2 November 2019
Publisher: HAL CCSD
Doi

Abstract

International audienceSupervised machine translation works well when the train and test data are sampled from the same distribution. When this is not the case, adaptation techniques help ensure that the knowledge learned from out-of-domain texts generalises to in-domain sentences. We study here a related setting, multi-domain adaptation, where the number of domains is potentially large and adapting separately to each domain would waste training resources. Our proposal transposes to neural machine translation the feature expansion technique of (Daum\'e III, 2007): it isolates domain-agnostic from domain-specific lexical representations, while sharing the most of the network across domains.Our experiments use two architectures and two language pairs: they show that our approach, while simple and computationally inexpensive, outperforms several strong baselines and delivers a multi-domain system that successfully translates texts from diverse sources

Similar works

Full text

Open in the Core reader

Download PDF

Available Versions

Archive Ouverte en Sciences de l'Information et de la Communication

oai:HAL:hal-02343215v1

Last time updated on 09/11/2019