Search CORE

14 research outputs found

GenTranslate : Large Language Models are Generative Multilingual Speech and Machine Translators

Author: Chen Chen
Chen Zhehuai
Chng Eng Siong
Hu Yuchen
Li Ruizhe
Yang Chao-Han Huck
Zhang Dong
Publication venue: ArXiv
Publication date: 10/02/2024
Field of study

17 pages. This work is open sourced at: https://github.com/YUCHEN005/GenTranslat

Aberdeen University Research

Virtuoso: Massive Multilingual Speech-Text Joint Semi-Supervised Learning for Text-To-Speech

Author: Bapna Ankur
Chen Zhehuai
Morioka Nobuyuki
Ramabhadran Bhuvana
Rosenberg Andrew
Saeki Takaaki
Wang Gary
Zen Heiga
Zhang Yu
Publication venue
Publication date: 27/10/2022
Field of study

This paper proposes Virtuoso, a massively multilingual speech-text joint semi-supervised learning framework for text-to-speech synthesis (TTS) models. Existing multilingual TTS typically supports tens of languages, which are a small fraction of the thousands of languages in the world. One difficulty to scale multilingual TTS to hundreds of languages is collecting high-quality speech-text paired data in low-resource languages. This study extends Maestro, a speech-text joint pretraining framework for automatic speech recognition (ASR), to speech generation tasks. To train a TTS model from various types of speech and text data, different training schemes are designed to handle supervised (paired TTS and ASR data) and unsupervised (untranscribed speech and unspoken text) datasets. Experimental evaluation shows that 1) multilingual TTS models trained on Virtuoso can achieve significantly better naturalness and intelligibility than baseline ones in seen languages, and 2) they can synthesize reasonably intelligible and naturally sounding speech for unseen languages where no high-quality paired TTS data is available.Comment: Submitted to ICASSP 202

arXiv.org e-Print Archive