34,195 research outputs found
Deep speech inpainting of time-frequency masks
Transient loud intrusions, often occurring in noisy environments, can
completely overpower speech signal and lead to an inevitable loss of
information. While existing algorithms for noise suppression can yield
impressive results, their efficacy remains limited for very low signal-to-noise
ratios or when parts of the signal are missing. To address these limitations,
here we propose an end-to-end framework for speech inpainting, the
context-based retrieval of missing or severely distorted parts of
time-frequency representation of speech. The framework is based on a
convolutional U-Net trained via deep feature losses, obtained using speechVGG,
a deep speech feature extractor pre-trained on an auxiliary word classification
task. Our evaluation results demonstrate that the proposed framework can
recover large portions of missing or distorted time-frequency representation of
speech, up to 400 ms and 3.2 kHz in bandwidth. In particular, our approach
provided a substantial increase in STOI & PESQ objective metrics of the
initially corrupted speech samples. Notably, using deep feature losses to train
the framework led to the best results, as compared to conventional approaches.Comment: Accepted to InterSpeech202
EMD-based filtering (EMDF) of low-frequency noise for speech enhancement
An Empirical Mode Decomposition based filtering (EMDF) approach is presented as a post-processing stage for speech enhancement. This method is particularly effective in low frequency noise environments. Unlike previous EMD based denoising methods, this approach does not make the assumption that the contaminating noise signal is fractional Gaussian Noise. An adaptive method is developed to select the IMF index for separating the noise components from the speech based on the second-order IMF statistics. The low frequency noise components are then separated by a partial reconstruction from the IMFs. It is shown that the proposed EMDF technique is able to suppress residual noise from speech signals that were enhanced by the conventional optimallymodified log-spectral amplitude approach which uses a minimum statistics based noise estimate. A comparative performance study is included that demonstrates the effectiveness of the EMDF system in various noise environments, such as car interior noise, military vehicle noise and babble noise. In particular, improvements up to 10 dB are obtained in car noise environments. Listening tests were performed that confirm the results
Blind MultiChannel Identification and Equalization for Dereverberation and Noise Reduction based on Convolutive Transfer Function
This paper addresses the problems of blind channel identification and
multichannel equalization for speech dereverberation and noise reduction. The
time-domain cross-relation method is not suitable for blind room impulse
response identification, due to the near-common zeros of the long impulse
responses. We extend the cross-relation method to the short-time Fourier
transform (STFT) domain, in which the time-domain impulse responses are
approximately represented by the convolutive transfer functions (CTFs) with
much less coefficients. The CTFs suffer from the common zeros caused by the
oversampled STFT. We propose to identify CTFs based on the STFT with the
oversampled signals and the critical sampled CTFs, which is a good compromise
between the frequency aliasing of the signals and the common zeros problem of
CTFs. In addition, a normalization of the CTFs is proposed to remove the gain
ambiguity across sub-bands. In the STFT domain, the identified CTFs is used for
multichannel equalization, in which the sparsity of speech signals is
exploited. We propose to perform inverse filtering by minimizing the
-norm of the source signal with the relaxed -norm fitting error
between the micophone signals and the convolution of the estimated source
signal and the CTFs used as a constraint. This method is advantageous in that
the noise can be reduced by relaxing the -norm to a tolerance
corresponding to the noise power, and the tolerance can be automatically set.
The experiments confirm the efficiency of the proposed method even under
conditions with high reverberation levels and intense noise.Comment: 13 pages, 5 figures, 5 table
- …