1 research outputs found
CinC-GAN for Effective F0 prediction for Whisper-to-Normal Speech Conversion
Recently, Generative Adversarial Networks (GAN)-based methods have shown
remarkable performance for the Voice Conversion and WHiSPer-to-normal SPeeCH
(WHSP2SPCH) conversion. One of the key challenges in WHSP2SPCH conversion is
the prediction of fundamental frequency (F0). Recently, authors have proposed
state-of-the-art method Cycle-Consistent Generative Adversarial Networks
(CycleGAN) for WHSP2SPCH conversion. The CycleGAN-based method uses two
different models, one for Mel Cepstral Coefficients (MCC) mapping, and another
for F0 prediction, where F0 is highly dependent on the pre-trained model of MCC
mapping. This leads to additional non-linear noise in predicted F0. To suppress
this noise, we propose Cycle-in-Cycle GAN (i.e., CinC-GAN). It is specially
designed to increase the effectiveness in F0 prediction without losing the
accuracy of MCC mapping. We evaluated the proposed method on a non-parallel
setting and analyzed on speaker-specific, and gender-specific tasks. The
objective and subjective tests show that CinC-GAN significantly outperforms the
CycleGAN. In addition, we analyze the CycleGAN and CinC-GAN for unseen speakers
and the results show the clear superiority of CinC-GAN.Comment: Accepted in 28th European Signal Processing Conference (EUSIPCO),
202