103 research outputs found

    Speech Enhancement Using Speech Synthesis Techniques

    Full text link
    Traditional speech enhancement systems reduce noise by modifying the noisy signal to make it more like a clean signal, which suffers from two problems: under-suppression of noise and over-suppression of speech. These problems create distortions in enhanced speech and hurt the quality of the enhanced signal. We propose to utilize speech synthesis techniques for a higher quality speech enhancement system. Synthesizing clean speech based on the noisy signal could produce outputs that are both noise-free and high quality. We first show that we can replace the noisy speech with its clean resynthesis from a previously recorded clean speech dictionary from the same speaker (concatenative resynthesis). Next, we show that using a speech synthesizer (vocoder) we can create a clean resynthesis of the noisy speech for more than one speaker. We term this parametric resynthesis (PR). PR can generate better prosody from noisy speech than a TTS system which uses textual information only. Additionally, we can use the high quality speech generation capability of neural vocoders for better quality speech enhancement. When trained on data from enough speakers, these vocoders can generate speech from unseen speakers, both male, and female, with similar quality as seen speakers in training. Finally, we show that using neural vocoders we can achieve better objective signal and overall quality than the state-of-the-art speech enhancement systems and better subjective quality than an oracle mask-based system

    Resynthesis of Spatial Room Impulse Response tails With anisotropic multi-slope decays

    Get PDF
    Spatial room impulse responses (SRIRs) capture room acoustics with directional information. SRIRs measured in coupled rooms and spaces with non-uniform absorption distribution may exhibit anisotropic reverberation decays and multiple decay slopes. However, noisy measurements with low signal-to-noise ratios pose issues in analysis and reproduction in practice. This paper presents a method for resynthesis of the late decay of anisotropic SRIRs, effectively removing noise from SRIR measurements. The method accounts for both multi-slope decays and directional reverberation. A spherical filter bank extracts directionally constrained signals from Ambisonic input, which are then analyzed and parameterized in terms of multiple exponential decays and a noise floor. The noisy late reverberation is then resynthesized from the estimated parameters using modal synthesis, and the restored SRIR is reconstructed as Ambisonic signals. The method is evaluated both numerically and perceptually, which shows that SRIRs can be denoised with minimal error as long as parts of the decay slope are above the noise level, with signal-to-noise ratios as low as 40 dB in the presented experiment. The method can be used to increase the perceived spatial audio quality of noise-impaired SRIRs.Peer reviewe

    Fast Algorithms for Inventory Based Speech Enhancement

    Get PDF

    Generative De-Quantization for Neural Speech Codec via Latent Diffusion

    Full text link
    In low-bitrate speech coding, end-to-end speech coding networks aim to learn compact yet expressive features and a powerful decoder in a single network. A challenging problem as such results in unwelcome complexity increase and inferior speech quality. In this paper, we propose to separate the representation learning and information reconstruction tasks. We leverage an end-to-end codec for learning low-dimensional discrete tokens and employ a latent diffusion model to de-quantize coded features into a high-dimensional continuous space, relieving the decoder's burden of de-quantizing and upsampling. To mitigate the issue of over-smooth generation, we introduce midway-infilling with less noise reduction and stronger conditioning. In ablation studies, we investigate the hyperparameters for midway-infilling and latent diffusion space with different dimensions. Subjective listening tests show that our model outperforms the state-of-the-art at two low bitrates, 1.5 and 3 kbps. Codes and samples of this work are available on our webpage.Comment: Submitted to ICASSP 202

    Analysis, Visualization, and Transformation of Audio Signals Using Dictionary-based Methods

    Get PDF
    date-added: 2014-01-07 09:15:58 +0000 date-modified: 2014-01-07 09:15:58 +0000date-added: 2014-01-07 09:15:58 +0000 date-modified: 2014-01-07 09:15:58 +000
    • …
    corecore