47 research outputs found
End-to-end Recurrent Denoising Autoencoder Embeddings for Speaker Identification
Speech 'in-the-wild' is a handicap for speaker recognition systems due to the
variability induced by real-life conditions, such as environmental noise and
emotions in the speaker. Taking advantage of representation learning, on this
paper we aim to design a recurrent denoising autoencoder that extracts robust
speaker embeddings from noisy spectrograms to perform speaker identification.
The end-to-end proposed architecture uses a feedback loop to encode information
regarding the speaker into low-dimensional representations extracted by a
spectrogram denoising autoencoder. We employ data augmentation techniques by
additively corrupting clean speech with real life environmental noise and make
use of a database with real stressed speech. We prove that the joint
optimization of both the denoiser and the speaker identification module
outperforms independent optimization of both modules under stress and noise
distortions as well as hand-crafted features.Comment: 8 pages + 2 of references + 5 of images. Submitted on Monday 20th of
July to Elsevier Signal Processing Short Communication