17 research outputs found
LMs with a Voice: Spoken Language Modeling beyond Speech Tokens
We present SPECTRON, a novel approach to adapting pre-trained language models
(LMs) to perform speech continuation. By leveraging pre-trained speech
encoders, our model generates both text and speech outputs with the entire
system being trained end-to-end operating directly on spectrograms. Training
the entire model in the spectrogram domain simplifies our speech continuation
system versus existing cascade methods which use discrete speech
representations. We further show our method surpasses existing spoken language
models both in semantic content and speaker preservation while also benefiting
from the knowledge transferred from pre-existing models. Audio samples can be
found in our website https://michelleramanovich.github.io/spectron/spectro