2,362 research outputs found
Whisper-to-speech conversion using restricted Boltzmann machine arrays
Whispers are a natural vocal communication mechanism, in which vocal cords do not vibrate normally. Lack of glottal-induced pitch leads to low energy, and an inherent noise-like spectral distribution reduces intelligibility. Much research has been devoted to processing of whispers, including conversion of whispers to speech. Unfortunately, among several approaches, the best reconstructed speech to date still contains obviously artificial muffles and suffers from an unnatural prosody. To address these issues, the novel use of multiple restricted Boltzmann machines (RBMs) is reported as a statistical conversion model between whisper and speech spectral envelopes. Moreover, the accuracy of estimated pitch is improved using machine learning techniques for pitch estimation within only voiced (V) regions. Both objective and subjective evaluations show that this new method improves the quality of whisper-reconstructed speech compared with the state-of-the-art approaches
Collapsed speech segment detection and suppression for WaveNet vocoder
In this paper, we propose a technique to alleviate the quality degradation
caused by collapsed speech segments sometimes generated by the WaveNet vocoder.
The effectiveness of the WaveNet vocoder for generating natural speech from
acoustic features has been proved in recent works. However, it sometimes
generates very noisy speech with collapsed speech segments when only a limited
amount of training data is available or significant acoustic mismatches exist
between the training and testing data. Such a limitation on the corpus and
limited ability of the model can easily occur in some speech generation
applications, such as voice conversion and speech enhancement. To address this
problem, we propose a technique to automatically detect collapsed speech
segments. Moreover, to refine the detected segments, we also propose a waveform
generation technique for WaveNet using a linear predictive coding constraint.
Verification and subjective tests are conducted to investigate the
effectiveness of the proposed techniques. The verification results indicate
that the detection technique can detect most collapsed segments. The subjective
evaluations of voice conversion demonstrate that the generation technique
significantly improves the speech quality while maintaining the same speaker
similarity.Comment: 5 pages, 6 figures. Proc. Interspeech, 201
- …