Search CORE

7 research outputs found

MnTTS: An Open-Source Mongolian Text-to-Speech Synthesis Dataset and Accompanied Baseline

Author: Bao Feilong
Gao Guanglai
Hu Yifan
Liu Rui
Yin Pengkai
Publication venue
Publication date: 22/09/2022
Field of study

This paper introduces a high-quality open-source text-to-speech (TTS) synthesis dataset for Mongolian, a low-resource language spoken by over 10 million people worldwide. The dataset, named MnTTS, consists of about 8 hours of transcribed audio recordings spoken by a 22-year-old professional female Mongolian announcer. It is the first publicly available dataset developed to promote Mongolian TTS applications in both academia and industry. In this paper, we share our experience by describing the dataset development procedures and faced challenges. To demonstrate the reliability of our dataset, we built a powerful non-autoregressive baseline system based on FastSpeech2 model and HiFi-GAN vocoder, and evaluated it using the subjective mean opinion score (MOS) and real time factor (RTF) metrics. Evaluation results show that the powerful baseline system trained on our dataset achieves MOS above 4 and RTF about

3.30\times10^{-1}

, which makes it applicable for practical use. The dataset, training recipe, and pretrained TTS models are freely available \footnote{\label{github}\url{https://github.com/walker-hyf/MnTTS}}.Comment: Accepted at the 2022 International Conference on Asian Language Processing (IALP2022

arXiv.org e-Print Archive

SeamlessM4T-Massively Multilingual & Multimodal Machine Translation

Author: Akula Bapi
Andrews Pierre
Balioglu Can
Barrault Loïc
Celebi Onur
Chen Peng-Jen
Chung Yu-An
Communication Seamless
Costa-jussà Marta R.
Dale David
Dong Ning
Duquenne Paul-Ambroise
Elbayad Maha
Ellis Brian
Elsahar Hady
Gao Cynthia
Gong Hongyu
Gonzalez Gabriel Mejia
Guzmán Francisco
Haaheim Justin
Hachem Naji El
Hansanti Prangthip
Heffernan Kevin
Hoffman John
Howes Russ
Huang Bernie
Hwang Min-Jae
Inaguma Hirofumi
Jain Somya
Kalbassi Elahe
Kallet Amanda
Kao Justine
Klaiber Christopher
Kulikov Ilia
Lam Janice
Lee Ann
Li Daniel
Li Pengwei
Licht Daniel
Ma Xutai
Maillard Jean
Mavlyutov Ruslan
Meglioli Mariano Cora
Mourachko Alexandre
Peloquin Benjamin
Pino Juan
Popuri Sravya
Rakotoarison Alice
Ramadan Mohamed
Ramakrishnan Abinesh
Ropers Christophe
Sadagopan Kaushik Ram
Saleem Safiyyah
Schwenk Holger
Sun Anna
Tomasello Paden
Tran Kevin
Tran Tuan
Tufanov Igor
Vogeti Vish
Wang Changhan
Wang Jeff
Wang Skyler
Wenzek Guillaume
Wood Carleigh
Yang Yilin
Ye Ethan
Yu Bokai
Publication venue
Publication date: 23/08/2023
Field of study

What does it take to create the Babel Fish, a tool that can help individuals translate speech between any two languages? While recent breakthroughs in text-based models have pushed machine translation coverage beyond 200 languages, unified speech-to-speech translation models have yet to achieve similar strides. More specifically, conventional speech-to-speech translation systems rely on cascaded systems that perform translation progressively, putting high-performing unified systems out of reach. To address these gaps, we introduce SeamlessM4T, a single model that supports speech-to-speech translation, speech-to-text translation, text-to-speech translation, text-to-text translation, and automatic speech recognition for up to 100 languages. To build this, we used 1 million hours of open speech audio data to learn self-supervised speech representations with w2v-BERT 2.0. Subsequently, we created a multimodal corpus of automatically aligned speech translations. Filtered and combined with human-labeled and pseudo-labeled data, we developed the first multilingual system capable of translating from and into English for both speech and text. On FLEURS, SeamlessM4T sets a new standard for translations into multiple target languages, achieving an improvement of 20% BLEU over the previous SOTA in direct speech-to-text translation. Compared to strong cascaded models, SeamlessM4T improves the quality of into-English translation by 1.3 BLEU points in speech-to-text and by 2.6 ASR-BLEU points in speech-to-speech. Tested for robustness, our system performs better against background noises and speaker variations in speech-to-text tasks compared to the current SOTA model. Critically, we evaluated SeamlessM4T on gender bias and added toxicity to assess translation safety. Finally, all contributions in this work are open-sourced and accessible at https://github.com/facebookresearch/seamless_communicatio

arXiv.org e-Print Archive

Chinese elements : a bridge of the integration between Chinese -English translation and linguaculture transnational mobility

Author: Yang Jinbao
Publication venue: The American Scholars Press, Inc.
Publication date: 01/01/2016
Field of study

[Abstract] As the popularity of Chinese elements in the innovation of the translation part in Chinese CET, we realized that Chinese elements have become a bridge between linguaculture transnational mobility and Chinese-English translation.So, Chinese students translation skills should be critically improved; for example, on their understanding about Chinese culture, especially the meaning of Chinese culture. Five important secrets of skillful translation are introduced to improve students’ translation skills

Ghent University Academic Bibliography