13,338 research outputs found

    Rank-1 Constrained Multichannel Wiener Filter for Speech Recognition in Noisy Environments

    Get PDF
    Multichannel linear filters, such as the Multichannel Wiener Filter (MWF) and the Generalized Eigenvalue (GEV) beamformer are popular signal processing techniques which can improve speech recognition performance. In this paper, we present an experimental study on these linear filters in a specific speech recognition task, namely the CHiME-4 challenge, which features real recordings in multiple noisy environments. Specifically, the rank-1 MWF is employed for noise reduction and a new constant residual noise power constraint is derived which enhances the recognition performance. To fulfill the underlying rank-1 assumption, the speech covariance matrix is reconstructed based on eigenvectors or generalized eigenvectors. Then the rank-1 constrained MWF is evaluated with alternative multichannel linear filters under the same framework, which involves a Bidirectional Long Short-Term Memory (BLSTM) network for mask estimation. The proposed filter outperforms alternative ones, leading to a 40% relative Word Error Rate (WER) reduction compared with the baseline Weighted Delay and Sum (WDAS) beamformer on the real test set, and a 15% relative WER reduction compared with the GEV-BAN method. The results also suggest that the speech recognition accuracy correlates more with the Mel-frequency cepstral coefficients (MFCC) feature variance than with the noise reduction or the speech distortion level.Comment: for Computer Speech and Languag

    Blind MultiChannel Identification and Equalization for Dereverberation and Noise Reduction based on Convolutive Transfer Function

    Get PDF
    This paper addresses the problems of blind channel identification and multichannel equalization for speech dereverberation and noise reduction. The time-domain cross-relation method is not suitable for blind room impulse response identification, due to the near-common zeros of the long impulse responses. We extend the cross-relation method to the short-time Fourier transform (STFT) domain, in which the time-domain impulse responses are approximately represented by the convolutive transfer functions (CTFs) with much less coefficients. The CTFs suffer from the common zeros caused by the oversampled STFT. We propose to identify CTFs based on the STFT with the oversampled signals and the critical sampled CTFs, which is a good compromise between the frequency aliasing of the signals and the common zeros problem of CTFs. In addition, a normalization of the CTFs is proposed to remove the gain ambiguity across sub-bands. In the STFT domain, the identified CTFs is used for multichannel equalization, in which the sparsity of speech signals is exploited. We propose to perform inverse filtering by minimizing the â„“1\ell_1-norm of the source signal with the relaxed â„“2\ell_2-norm fitting error between the micophone signals and the convolution of the estimated source signal and the CTFs used as a constraint. This method is advantageous in that the noise can be reduced by relaxing the â„“2\ell_2-norm to a tolerance corresponding to the noise power, and the tolerance can be automatically set. The experiments confirm the efficiency of the proposed method even under conditions with high reverberation levels and intense noise.Comment: 13 pages, 5 figures, 5 table

    Sub-Band Knowledge Distillation Framework for Speech Enhancement

    Full text link
    In single-channel speech enhancement, methods based on full-band spectral features have been widely studied. However, only a few methods pay attention to non-full-band spectral features. In this paper, we explore a knowledge distillation framework based on sub-band spectral mapping for single-channel speech enhancement. Specifically, we divide the full frequency band into multiple sub-bands and pre-train an elite-level sub-band enhancement model (teacher model) for each sub-band. These teacher models are dedicated to processing their own sub-bands. Next, under the teacher models' guidance, we train a general sub-band enhancement model (student model) that works for all sub-bands. Without increasing the number of model parameters and computational complexity, the student model's performance is further improved. To evaluate our proposed method, we conducted a large number of experiments on an open-source data set. The final experimental results show that the guidance from the elite-level teacher models dramatically improves the student model's performance, which exceeds the full-band model by employing fewer parameters.Comment: Published in Interspeech 202

    Sub-Nyquist Sampling: Bridging Theory and Practice

    Full text link
    Sampling theory encompasses all aspects related to the conversion of continuous-time signals to discrete streams of numbers. The famous Shannon-Nyquist theorem has become a landmark in the development of digital signal processing. In modern applications, an increasingly number of functions is being pushed forward to sophisticated software algorithms, leaving only those delicate finely-tuned tasks for the circuit level. In this paper, we review sampling strategies which target reduction of the ADC rate below Nyquist. Our survey covers classic works from the early 50's of the previous century through recent publications from the past several years. The prime focus is bridging theory and practice, that is to pinpoint the potential of sub-Nyquist strategies to emerge from the math to the hardware. In that spirit, we integrate contemporary theoretical viewpoints, which study signal modeling in a union of subspaces, together with a taste of practical aspects, namely how the avant-garde modalities boil down to concrete signal processing systems. Our hope is that this presentation style will attract the interest of both researchers and engineers in the hope of promoting the sub-Nyquist premise into practical applications, and encouraging further research into this exciting new frontier.Comment: 48 pages, 18 figures, to appear in IEEE Signal Processing Magazin

    Multichannel Sampling of Pulse Streams at the Rate of Innovation

    Full text link
    We consider minimal-rate sampling schemes for infinite streams of delayed and weighted versions of a known pulse shape. The minimal sampling rate for these parametric signals is referred to as the rate of innovation and is equal to the number of degrees of freedom per unit time. Although sampling of infinite pulse streams was treated in previous works, either the rate of innovation was not achieved, or the pulse shape was limited to Diracs. In this paper we propose a multichannel architecture for sampling pulse streams with arbitrary shape, operating at the rate of innovation. Our approach is based on modulating the input signal with a set of properly chosen waveforms, followed by a bank of integrators. This architecture is motivated by recent work on sub-Nyquist sampling of multiband signals. We show that the pulse stream can be recovered from the proposed minimal-rate samples using standard tools taken from spectral estimation in a stable way even at high rates of innovation. In addition, we address practical implementation issues, such as reduction of hardware complexity and immunity to failure in the sampling channels. The resulting scheme is flexible and exhibits better noise robustness than previous approaches
    • …
    corecore