Non-Parallel Voice Conversion System Using An Auto-Regressive Model

Abstract

International audienceMuch existing voice conversion (VC) systems are attractive owing to their high performance in terms of voice quality and speaker similarity. Nevertheless, without parallel training data, some generated waveform trajectories are not yet smooth, leading to degraded sound quality and mispronunciation issues in the converted speech. To address these shortcomings, this paper proposes a non-parallel VC system based on an auto-regressive model, Phonetic PosteriorGrams (PPGs), and an LPCnet vocoder to generate high-quality converted speech. The proposed auto-regressive structure makes our system able to produce the next step outputs from the previous step acoustic features. Further, the use of PPGs aims to convert any unknown source speaker into a specific target speaker due to their speaker-independent properties. We evaluate the effectiveness of our system by performing any-to-one conversion pairs between native English speakers. Objective and subjective measures show that our method outperforms the best non-parallel VC method of Voice Conversion Challenge 2018 in terms of naturalness and speaker similarity

    Similar works

    Full text

    thumbnail-image