4,373 research outputs found
Adversarially Trained Autoencoders for Parallel-Data-Free Voice Conversion
We present a method for converting the voices between a set of speakers. Our
method is based on training multiple autoencoder paths, where there is a single
speaker-independent encoder and multiple speaker-dependent decoders. The
autoencoders are trained with an addition of an adversarial loss which is
provided by an auxiliary classifier in order to guide the output of the encoder
to be speaker independent. The training of the model is unsupervised in the
sense that it does not require collecting the same utterances from the speakers
nor does it require time aligning over phonemes. Due to the use of a single
encoder, our method can generalize to converting the voice of out-of-training
speakers to speakers in the training dataset. We present subjective tests
corroborating the performance of our method
Nonparallel Emotional Speech Conversion
We propose a nonparallel data-driven emotional speech conversion method. It
enables the transfer of emotion-related characteristics of a speech signal
while preserving the speaker's identity and linguistic content. Most existing
approaches require parallel data and time alignment, which is not available in
most real applications. We achieve nonparallel training based on an
unsupervised style transfer technique, which learns a translation model between
two distributions instead of a deterministic one-to-one mapping between paired
examples. The conversion model consists of an encoder and a decoder for each
emotion domain. We assume that the speech signal can be decomposed into an
emotion-invariant content code and an emotion-related style code in latent
space. Emotion conversion is performed by extracting and recombining the
content code of the source speech and the style code of the target emotion. We
tested our method on a nonparallel corpora with four emotions. Both subjective
and objective evaluations show the effectiveness of our approach.Comment: Published in INTERSPEECH 2019, 5 pages, 6 figures. Simulation
available at http://www.jian-gao.org/emoga
- …