2,043 research outputs found
Online Monaural Speech Enhancement Using Delayed Subband LSTM
This paper proposes a delayed subband LSTM network for online monaural
(single-channel) speech enhancement. The proposed method is developed in the
short time Fourier transform (STFT) domain. Online processing requires
frame-by-frame signal reception and processing. A paramount feature of the
proposed method is that the same LSTM is used across frequencies, which
drastically reduces the number of network parameters, the amount of training
data and the computational burden. Training is performed in a subband manner:
the input consists of one frequency, together with a few context frequencies.
The network learns a speech-to-noise discriminative function relying on the
signal stationarity and on the local spectral pattern, based on which it
predicts a clean-speech mask at each frequency. To exploit future information,
i.e. look-ahead, we propose an output-delayed subband architecture, which
allows the unidirectional forward network to process a few future frames in
addition to the current frame. We leverage the proposed method to participate
to the DNS real-time speech enhancement challenge. Experiments with the DNS
dataset show that the proposed method achieves better performance-measuring
scores than the DNS baseline method, which learns the full-band spectra using a
gated recurrent unit network.Comment: Paper submitted to Interspeech 202
Speech Enhancement Using Wavelet Coefficients Masking with Local Binary Patterns
In this paper, we present a wavelet coefficients masking
based on Local Binary Patterns (WLBP) approach to enhance the
temporal spectra of the wavelet coefficients for speech enhancement.
This technique exploits the wavelet denoising scheme, which splits
the degraded speech into pyramidal subband components and extracts
frequency information without losing temporal information. Speech
enhancement in each high-frequency subband is performed by binary
labels through the local binary pattern masking that encodes the ratio
between the original value of each coefficient and the values of the
neighbour coefficients. This approach enhances the high-frequency
spectra of the wavelet transform instead of eliminating them through
a threshold. A comparative analysis is carried out with conventional
speech enhancement algorithms, demonstrating that the proposed
technique achieves significant improvements in terms of PESQ, an
international recommendation of objective measure for estimating
subjective speech quality. Informal listening tests also show that
the proposed method in an acoustic context improves the quality
of speech, avoiding the annoying musical noise present in other
speech enhancement techniques. Experimental results obtained with a
DNN based speech recognizer in noisy environments corroborate the
superiority of the proposed scheme in the robust speech recognition
scenario
A Subband-Based SVM Front-End for Robust ASR
This work proposes a novel support vector machine (SVM) based robust
automatic speech recognition (ASR) front-end that operates on an ensemble of
the subband components of high-dimensional acoustic waveforms. The key issues
of selecting the appropriate SVM kernels for classification in frequency
subbands and the combination of individual subband classifiers using ensemble
methods are addressed. The proposed front-end is compared with state-of-the-art
ASR front-ends in terms of robustness to additive noise and linear filtering.
Experiments performed on the TIMIT phoneme classification task demonstrate the
benefits of the proposed subband based SVM front-end: it outperforms the
standard cepstral front-end in the presence of noise and linear filtering for
signal-to-noise ratio (SNR) below 12-dB. A combination of the proposed
front-end with a conventional front-end such as MFCC yields further
improvements over the individual front ends across the full range of noise
levels
Feature Extracting in the Presence of Environmental Noise, using Subband Adaptive Filtering
In this work, a new feature extracting method in noisy environments is proposed. The approach is based on subband decomposition of speech signals followed by adaptive filtering in the noisiest subbbands of speech. The speech decomposition is obtained using low complexity octave filter bank, while adaptive filtering is performed using the normalized least mean square algorithm. The performance of the new feature was evaluated for isolated word speech recognition in the presence of a car noise. The proposed method showed higher recognition accuracy than conventional methods in noisy environments
- …