Search CORE

103 research outputs found

Speech Enhancement Using Speech Synthesis Techniques

Author: Maiti Soumi
Publication venue: CUNY Academic Works
Publication date: 01/02/2021
Field of study

Traditional speech enhancement systems reduce noise by modifying the noisy signal to make it more like a clean signal, which suffers from two problems: under-suppression of noise and over-suppression of speech. These problems create distortions in enhanced speech and hurt the quality of the enhanced signal. We propose to utilize speech synthesis techniques for a higher quality speech enhancement system. Synthesizing clean speech based on the noisy signal could produce outputs that are both noise-free and high quality. We first show that we can replace the noisy speech with its clean resynthesis from a previously recorded clean speech dictionary from the same speaker (concatenative resynthesis). Next, we show that using a speech synthesizer (vocoder) we can create a clean resynthesis of the noisy speech for more than one speaker. We term this parametric resynthesis (PR). PR can generate better prosody from noisy speech than a TTS system which uses textual information only. Additionally, we can use the high quality speech generation capability of neural vocoders for better quality speech enhancement. When trained on data from enough speakers, these vocoders can generate speech from unseen speakers, both male, and female, with similar quality as seen speakers in training. Finally, we show that using neural vocoders we can achieve better objective signal and overall quality than the state-of-the-art speech enhancement systems and better subjective quality than an oracle mask-based system

City University of New York

Resynthesis of Spatial Room Impulse Response tails With anisotropic multi-slope decays

Author: Götz Georg
Hold Christoph
McKenzie Thomas
Pulkki Ville
Schlecht Sebastian J.
Publication venue: 'Audio Engineering Society'
Publication date: 01/06/2022
Field of study

Spatial room impulse responses (SRIRs) capture room acoustics with directional information. SRIRs measured in coupled rooms and spaces with non-uniform absorption distribution may exhibit anisotropic reverberation decays and multiple decay slopes. However, noisy measurements with low signal-to-noise ratios pose issues in analysis and reproduction in practice. This paper presents a method for resynthesis of the late decay of anisotropic SRIRs, effectively removing noise from SRIR measurements. The method accounts for both multi-slope decays and directional reverberation. A spherical filter bank extracts directionally constrained signals from Ambisonic input, which are then analyzed and parameterized in terms of multiple exponential decays and a noise floor. The noisy late reverberation is then resynthesized from the estimated parameters using modal synthesis, and the restored SRIR is reconstructed as Ambisonic signals. The method is evaluated both numerically and perceptually, which shows that SRIRs can be denoised with minimal error as long as parts of the decay slope are above the noise level, with signal-to-noise ratios as low as 40 dB in the presented experiment. The method can be used to increase the perceived spatial audio quality of noise-impaired SRIRs.Peer reviewe

Edinburgh Research Explorer

Aaltodoc Publication Archive

Fast Algorithms for Inventory Based Speech Enhancement

Author: Robert M. Nickel
Tomohiro Sugimoto
Publication venue: 'IntechOpen'
Publication date: 01/03/2010
Field of study

IntechOpen

Crossref

Generative De-Quantization for Neural Speech Codec via Latent Diffusion

Author: Jang Inseon
Kim Minje
Yang Haici
Publication venue
Publication date: 15/11/2023
Field of study

In low-bitrate speech coding, end-to-end speech coding networks aim to learn compact yet expressive features and a powerful decoder in a single network. A challenging problem as such results in unwelcome complexity increase and inferior speech quality. In this paper, we propose to separate the representation learning and information reconstruction tasks. We leverage an end-to-end codec for learning low-dimensional discrete tokens and employ a latent diffusion model to de-quantize coded features into a high-dimensional continuous space, relieving the decoder's burden of de-quantizing and upsampling. To mitigate the issue of over-smooth generation, we introduce midway-infilling with less noise reduction and stronger conditioning. In ablation studies, we investigate the hyperparameters for midway-infilling and latent diffusion space with different dimensions. Subjective listening tests show that our model outperforms the state-of-the-art at two low bitrates, 1.5 and 3 kbps. Codes and samples of this work are available on our webpage.Comment: Submitted to ICASSP 202

arXiv.org e-Print Archive

Analysis, Visualization, and Transformation of Audio Signals Using Dictionary-based Methods

Author: Aaron McLeran
Bob L. Sturm
Boyd S.
Curtis Roads
Gabor D.
Gersho A.
John J. Shynk
Mallat S.
Meyer C.
Roads C.
Roads C.
Sturm B. L.
Xenakis I.
Publication venue: 'Informa UK Limited'
Publication date: 17/12/2009
Field of study

date-added: 2014-01-07 09:15:58 +0000 date-modified: 2014-01-07 09:15:58 +0000date-added: 2014-01-07 09:15:58 +0000 date-modified: 2014-01-07 09:15:58 +000

Crossref

Queen Mary Research Online