136 research outputs found
Bootstrapping Non-Parallel Voice Conversion From Speaker-Adaptive Text-to-Speech
Voice conversion (VC) and text-to-speech (TTS) are two tasks that share a
similar objective, generating speech with a target voice. However, they are
usually developed independently under vastly different frameworks. In this
paper, we propose a methodology to bootstrap a VC system from a pretrained
speaker-adaptive TTS model and unify the techniques as well as the
interpretations of these two tasks. Moreover by offloading the heavy data
demand to the training stage of the TTS model, our VC system can be built using
a small amount of target speaker speech data. It also opens up the possibility
of using speech in a foreign unseen language to build the system. Our
subjective evaluations show that the proposed framework is able to not only
achieve competitive performance in the standard intra-language scenario but
also adapt and convert using speech utterances in an unseen language.Comment: Accepted for IEEE ASRU 201
- …