13,338 research outputs found
Rank-1 Constrained Multichannel Wiener Filter for Speech Recognition in Noisy Environments
Multichannel linear filters, such as the Multichannel Wiener Filter (MWF) and
the Generalized Eigenvalue (GEV) beamformer are popular signal processing
techniques which can improve speech recognition performance. In this paper, we
present an experimental study on these linear filters in a specific speech
recognition task, namely the CHiME-4 challenge, which features real recordings
in multiple noisy environments. Specifically, the rank-1 MWF is employed for
noise reduction and a new constant residual noise power constraint is derived
which enhances the recognition performance. To fulfill the underlying rank-1
assumption, the speech covariance matrix is reconstructed based on eigenvectors
or generalized eigenvectors. Then the rank-1 constrained MWF is evaluated with
alternative multichannel linear filters under the same framework, which
involves a Bidirectional Long Short-Term Memory (BLSTM) network for mask
estimation. The proposed filter outperforms alternative ones, leading to a 40%
relative Word Error Rate (WER) reduction compared with the baseline Weighted
Delay and Sum (WDAS) beamformer on the real test set, and a 15% relative WER
reduction compared with the GEV-BAN method. The results also suggest that the
speech recognition accuracy correlates more with the Mel-frequency cepstral
coefficients (MFCC) feature variance than with the noise reduction or the
speech distortion level.Comment: for Computer Speech and Languag
Blind MultiChannel Identification and Equalization for Dereverberation and Noise Reduction based on Convolutive Transfer Function
This paper addresses the problems of blind channel identification and
multichannel equalization for speech dereverberation and noise reduction. The
time-domain cross-relation method is not suitable for blind room impulse
response identification, due to the near-common zeros of the long impulse
responses. We extend the cross-relation method to the short-time Fourier
transform (STFT) domain, in which the time-domain impulse responses are
approximately represented by the convolutive transfer functions (CTFs) with
much less coefficients. The CTFs suffer from the common zeros caused by the
oversampled STFT. We propose to identify CTFs based on the STFT with the
oversampled signals and the critical sampled CTFs, which is a good compromise
between the frequency aliasing of the signals and the common zeros problem of
CTFs. In addition, a normalization of the CTFs is proposed to remove the gain
ambiguity across sub-bands. In the STFT domain, the identified CTFs is used for
multichannel equalization, in which the sparsity of speech signals is
exploited. We propose to perform inverse filtering by minimizing the
-norm of the source signal with the relaxed -norm fitting error
between the micophone signals and the convolution of the estimated source
signal and the CTFs used as a constraint. This method is advantageous in that
the noise can be reduced by relaxing the -norm to a tolerance
corresponding to the noise power, and the tolerance can be automatically set.
The experiments confirm the efficiency of the proposed method even under
conditions with high reverberation levels and intense noise.Comment: 13 pages, 5 figures, 5 table
Sub-Band Knowledge Distillation Framework for Speech Enhancement
In single-channel speech enhancement, methods based on full-band spectral
features have been widely studied. However, only a few methods pay attention to
non-full-band spectral features. In this paper, we explore a knowledge
distillation framework based on sub-band spectral mapping for single-channel
speech enhancement. Specifically, we divide the full frequency band into
multiple sub-bands and pre-train an elite-level sub-band enhancement model
(teacher model) for each sub-band. These teacher models are dedicated to
processing their own sub-bands. Next, under the teacher models' guidance, we
train a general sub-band enhancement model (student model) that works for all
sub-bands. Without increasing the number of model parameters and computational
complexity, the student model's performance is further improved. To evaluate
our proposed method, we conducted a large number of experiments on an
open-source data set. The final experimental results show that the guidance
from the elite-level teacher models dramatically improves the student model's
performance, which exceeds the full-band model by employing fewer parameters.Comment: Published in Interspeech 202
Sub-Nyquist Sampling: Bridging Theory and Practice
Sampling theory encompasses all aspects related to the conversion of
continuous-time signals to discrete streams of numbers. The famous
Shannon-Nyquist theorem has become a landmark in the development of digital
signal processing. In modern applications, an increasingly number of functions
is being pushed forward to sophisticated software algorithms, leaving only
those delicate finely-tuned tasks for the circuit level.
In this paper, we review sampling strategies which target reduction of the
ADC rate below Nyquist. Our survey covers classic works from the early 50's of
the previous century through recent publications from the past several years.
The prime focus is bridging theory and practice, that is to pinpoint the
potential of sub-Nyquist strategies to emerge from the math to the hardware. In
that spirit, we integrate contemporary theoretical viewpoints, which study
signal modeling in a union of subspaces, together with a taste of practical
aspects, namely how the avant-garde modalities boil down to concrete signal
processing systems. Our hope is that this presentation style will attract the
interest of both researchers and engineers in the hope of promoting the
sub-Nyquist premise into practical applications, and encouraging further
research into this exciting new frontier.Comment: 48 pages, 18 figures, to appear in IEEE Signal Processing Magazin
Multichannel Sampling of Pulse Streams at the Rate of Innovation
We consider minimal-rate sampling schemes for infinite streams of delayed and
weighted versions of a known pulse shape. The minimal sampling rate for these
parametric signals is referred to as the rate of innovation and is equal to the
number of degrees of freedom per unit time. Although sampling of infinite pulse
streams was treated in previous works, either the rate of innovation was not
achieved, or the pulse shape was limited to Diracs. In this paper we propose a
multichannel architecture for sampling pulse streams with arbitrary shape,
operating at the rate of innovation. Our approach is based on modulating the
input signal with a set of properly chosen waveforms, followed by a bank of
integrators. This architecture is motivated by recent work on sub-Nyquist
sampling of multiband signals. We show that the pulse stream can be recovered
from the proposed minimal-rate samples using standard tools taken from spectral
estimation in a stable way even at high rates of innovation. In addition, we
address practical implementation issues, such as reduction of hardware
complexity and immunity to failure in the sampling channels. The resulting
scheme is flexible and exhibits better noise robustness than previous
approaches
- …