2,958 research outputs found
A CONSTRAINED MATCHING PURSUIT APPROACH TO AUDIO DECLIPPING
© 2011 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works
A Fully Time-domain Neural Model for Subband-based Speech Synthesizer
This paper introduces a deep neural network model for subband-based speech
synthesizer. The model benefits from the short bandwidth of the subband signals
to reduce the complexity of the time-domain speech generator. We employed the
multi-level wavelet analysis/synthesis to decompose/reconstruct the signal into
subbands in time domain. Inspired from the WaveNet, a convolutional neural
network (CNN) model predicts subband speech signals fully in time domain. Due
to the short bandwidth of the subbands, a simple network architecture is enough
to train the simple patterns of the subbands accurately. In the ground truth
experiments with teacher-forcing, the subband synthesizer outperforms the
fullband model significantly in terms of both subjective and objective
measures. In addition, by conditioning the model on the phoneme sequence using
a pronunciation dictionary, we have achieved the fully time-domain neural model
for subband-based text-to-speech (TTS) synthesizer, which is nearly end-to-end.
The generated speech of the subband TTS shows comparable quality as the
fullband one with a slighter network architecture for each subband.Comment: 5 pages, 3 figur
- …