Singing Voice Synthesis with Vibrato Modeling and Latent Energy
  Representation

Liu, Zhi; Song, Wei; Song, Yingjie; Yu, Yang; Zeng, Dan; Zhang, Wei; Zhang, Zhengchen

Singing Voice Synthesis with Vibrato Modeling and Latent Energy Representation

Authors: Zhi Liu
Wei Song
Yingjie Song
Yang Yu
Dan Zeng
Wei Zhang
Zhengchen Zhang
Publication date: 2 November 2022
Publisher

Abstract

This paper proposes an expressive singing voice synthesis system by introducing explicit vibrato modeling and latent energy representation. Vibrato is essential to the naturalness of synthesized sound, due to the inherent characteristics of human singing. Hence, a deep learning-based vibrato model is introduced in this paper to control the vibrato's likeliness, rate, depth and phase in singing, where the vibrato likeliness represents the existence probability of vibrato and it would help improve the singing voice's naturalness. Actually, there is no annotated label about vibrato likeliness in existing singing corpus. We adopt a novel vibrato likeliness labeling method to label the vibrato likeliness automatically. Meanwhile, the power spectrogram of audio contains rich information that can improve the expressiveness of singing. An autoencoder-based latent energy bottleneck feature is proposed for expressive singing voice synthesis. Experimental results on the open dataset NUS48E show that both the vibrato modeling and the latent energy representation could significantly improve the expressiveness of singing voice. The audio samples are shown in the demo website

Similar works

Full text

Available Versions

arXiv.org e-Print Archive

oai:arXiv.org:2211.00996

Last time updated on 08/12/2022