Search CORE

17 research outputs found

LMs with a Voice: Spoken Language Modeling beyond Speech Tokens

Author: Asawaroengchai Chulayutsh
Levkovitch Alon
Mariooryad Soroosh
Nachmani Eliya
Ramanovich Michelle Tadmor
Salazar Julian
Skerry-Ryan RJ
Publication venue
Publication date: 24/05/2023
Field of study

We present SPECTRON, a novel approach to adapting pre-trained language models (LMs) to perform speech continuation. By leveraging pre-trained speech encoders, our model generates both text and speech outputs with the entire system being trained end-to-end operating directly on spectrograms. Training the entire model in the spectrogram domain simplifies our speech continuation system versus existing cascade methods which use discrete speech representations. We further show our method surpasses existing spoken language models both in semantic content and speaker preservation while also benefiting from the knowledge transferred from pre-existing models. Audio samples can be found in our website https://michelleramanovich.github.io/spectron/spectro

arXiv.org e-Print Archive