13 research outputs found
A Deep Generative Model of Speech Complex Spectrograms
This paper proposes an approach to the joint modeling of the short-time
Fourier transform magnitude and phase spectrograms with a deep generative
model. We assume that the magnitude follows a Gaussian distribution and the
phase follows a von Mises distribution. To improve the consistency of the phase
values in the time-frequency domain, we also apply the von Mises distribution
to the phase derivatives, i.e., the group delay and the instantaneous
frequency. Based on these assumptions, we explore and compare several
combinations of loss functions for training our models. Built upon the
variational autoencoder framework, our model consists of three convolutional
neural networks acting as an encoder, a magnitude decoder, and a phase decoder.
In addition to the latent variables, we propose to also condition the phase
estimation on the estimated magnitude. Evaluated for a time-domain speech
reconstruction task, our models could generate speech with a high perceptual
quality and a high intelligibility
A fast Griffin Lim Algorithm
In this paper, we present a new algorithm to estimate a signal from its short-time Fourier transform modulus (STFTM). This algorithm is computationally simple and is obtained by an acceleration of the well-known Griffin-Lim algorithm (GLA). Before deriving the algorithm, we will give a new interpretation of the GLA and formulate the phase recovery problem in an optimization form. We then present some experimental results where the new algorithm is tested on various signals. It shows not only significant improvement in speed of convergence but it does as well recover the signals with a smaller error than the traditional GLA