Lego-MT: Towards Detachable Models in Massively Multilingual Machine
  Translation

Kong, Lingpeng; Li, Lei; Lu, Yinquan; Qiao, Yu; Xu, Jingjing; Yuan, Fei; Zhu, WenHao

Lego-MT: Towards Detachable Models in Massively Multilingual Machine Translation

Authors: Lingpeng Kong
Lei Li
Yinquan Lu
Yu Qiao
Jingjing Xu
Fei Yuan
WenHao Zhu
Publication date: 28 May 2023
Publisher

Abstract

Multilingual neural machine translation (MNMT) aims to build a unified model for many language directions. Existing monolithic models for MNMT encounter two challenges: parameter interference among languages and inefficient inference for large models. In this paper, we revisit the classic multi-way structures and develop a detachable model by assigning each language (or group of languages) to an individual branch that supports plug-and-play training and inference. To address the needs of learning representations for all languages in a unified space, we propose a novel efficient training recipe, upon which we build an effective detachable model, Lego-MT. For a fair comparison, we collect data from OPUS and build a translation benchmark covering 433 languages and 1.3B parallel data. Experiments show that Lego-MT with 1.2B parameters brings an average gain of 3.2 spBLEU. It even outperforms M2M-100 with 12B parameters. The proposed training recipe brings a 28.2

\times

speedup over the conventional multi-way training method.\footnote{ \url{https://github.com/CONE-MT/Lego-MT}.}Comment: ACL 2023 Finding

Similar works

Full text

Available Versions

arXiv.org e-Print Archive

oai:arXiv.org:2212.10551

Last time updated on 12/01/2023