SeamlessM4T-Massively Multilingual & Multimodal Machine Translation

Akula, Bapi; Andrews, Pierre; Balioglu, Can; Barrault, Loïc; Celebi, Onur; Chen, Peng-Jen; Chung, Yu-An; Communication, Seamless; Costa-jussà, Marta R.; Dale, David; Dong, Ning; Duquenne, Paul-Ambroise; Elbayad, Maha; Ellis, Brian; Elsahar, Hady; Gao, Cynthia; Gong, Hongyu; Gonzalez, Gabriel Mejia; Guzmán, Francisco; Haaheim, Justin; Hachem, Naji El; Hansanti, Prangthip; Heffernan, Kevin; Hoffman, John; Howes, Russ; Huang, Bernie; Hwang, Min-Jae; Inaguma, Hirofumi; Jain, Somya; Kalbassi, Elahe; Kallet, Amanda; Kao, Justine; Klaiber, Christopher; Kulikov, Ilia; Lam, Janice; Lee, Ann; Li, Daniel; Li, Pengwei; Licht, Daniel; Ma, Xutai; Maillard, Jean; Mavlyutov, Ruslan; Meglioli, Mariano Cora; Mourachko, Alexandre; Peloquin, Benjamin; Pino, Juan; Popuri, Sravya; Rakotoarison, Alice; Ramadan, Mohamed; Ramakrishnan, Abinesh; Ropers, Christophe; Sadagopan, Kaushik Ram; Saleem, Safiyyah; Schwenk, Holger; Sun, Anna; Tomasello, Paden; Tran, Kevin; Tran, Tuan; Tufanov, Igor; Vogeti, Vish; Wang, Changhan; Wang, Jeff; Wang, Skyler; Wenzek, Guillaume; Wood, Carleigh; Yang, Yilin; Ye, Ethan; Yu, Bokai

SeamlessM4T-Massively Multilingual & Multimodal Machine Translation

Authors: Bapi Akula
Pierre Andrews
Can Balioglu
Loïc Barrault
Onur Celebi
Peng-Jen Chen
Yu-An Chung
Seamless Communication
Marta R. Costa-jussà
David Dale
Ning Dong
Paul-Ambroise Duquenne
Maha Elbayad
Brian Ellis
Hady Elsahar
Cynthia Gao
Hongyu Gong
Gabriel Mejia Gonzalez
Francisco Guzmán
Justin Haaheim
Naji El Hachem
Prangthip Hansanti
Kevin Heffernan
John Hoffman
Russ Howes
Bernie Huang
Min-Jae Hwang
Hirofumi Inaguma
Somya Jain
Elahe Kalbassi
Amanda Kallet
Justine Kao
Christopher Klaiber
Ilia Kulikov
Janice Lam
Ann Lee
Daniel Li
Pengwei Li
Daniel Licht
Xutai Ma
Jean Maillard
Ruslan Mavlyutov
Mariano Cora Meglioli
Alexandre Mourachko
Benjamin Peloquin
Juan Pino
Sravya Popuri
Alice Rakotoarison
Mohamed Ramadan
Abinesh Ramakrishnan
Christophe Ropers
Kaushik Ram Sadagopan
Safiyyah Saleem
Holger Schwenk
Anna Sun
Paden Tomasello
Kevin Tran
Tuan Tran
Igor Tufanov
Vish Vogeti
Changhan Wang
Jeff Wang
Skyler Wang
Guillaume Wenzek
Carleigh Wood
Yilin Yang
Ethan Ye
Bokai Yu
Publication date: 23 August 2023
Publisher

Abstract

What does it take to create the Babel Fish, a tool that can help individuals translate speech between any two languages? While recent breakthroughs in text-based models have pushed machine translation coverage beyond 200 languages, unified speech-to-speech translation models have yet to achieve similar strides. More specifically, conventional speech-to-speech translation systems rely on cascaded systems that perform translation progressively, putting high-performing unified systems out of reach. To address these gaps, we introduce SeamlessM4T, a single model that supports speech-to-speech translation, speech-to-text translation, text-to-speech translation, text-to-text translation, and automatic speech recognition for up to 100 languages. To build this, we used 1 million hours of open speech audio data to learn self-supervised speech representations with w2v-BERT 2.0. Subsequently, we created a multimodal corpus of automatically aligned speech translations. Filtered and combined with human-labeled and pseudo-labeled data, we developed the first multilingual system capable of translating from and into English for both speech and text. On FLEURS, SeamlessM4T sets a new standard for translations into multiple target languages, achieving an improvement of 20% BLEU over the previous SOTA in direct speech-to-text translation. Compared to strong cascaded models, SeamlessM4T improves the quality of into-English translation by 1.3 BLEU points in speech-to-text and by 2.6 ASR-BLEU points in speech-to-speech. Tested for robustness, our system performs better against background noises and speaker variations in speech-to-text tasks compared to the current SOTA model. Critically, we evaluated SeamlessM4T on gender bias and added toxicity to assess translation safety. Finally, all contributions in this work are open-sourced and accessible at https://github.com/facebookresearch/seamless_communicatio

Similar works

Full text

Available Versions

arXiv.org e-Print Archive

oai:arXiv.org:2308.11596

Last time updated on 24/08/2023