3 research outputs found

    DNN-Based Multi-Frame MVDR Filtering for Single-Microphone Speech Enhancement

    Full text link
    Multi-frame approaches for single-microphone speech enhancement, e.g., the multi-frame minimum-variance-distortionless-response (MVDR) filter, are able to exploit speech correlations across neighboring time frames. In contrast to single-frame approaches such as the Wiener gain, it has been shown that multi-frame approaches achieve a substantial noise reduction with hardly any speech distortion, provided that an accurate estimate of the correlation matrices and especially the speech interframe correlation vector is available. Typical estimation procedures of the correlation matrices and the speech interframe correlation (IFC) vector require an estimate of the speech presence probability (SPP) in each time-frequency bin. In this paper, we propose to use a bi-directional long short-term memory deep neural network (DNN) to estimate a speech mask and a noise mask for each time-frequency bin, using which two different SPP estimates are derived. Aiming at achieving a robust performance, the DNN is trained for various noise types and signal-to-noise ratios. Experimental results show that the multi-frame MVDR in combination with the proposed data-driven SPP estimator yields an increased speech quality compared to a state-of-the-art model-based estimator

    Perspectives

    Get PDF
    International audienceSource separation and speech enhancement research has made dramatic progress in the last 30 years. It is now a mainstream topic in speech and audio processing, with hundreds of papers published every year. Separation and enhancement performance have greatly improved and successful commercial applications are increasingly being deployed. This chapter provides an overview of research and development perspectives in the field. We do not attempt to cover all perspectives currently under discussion in the community. Instead, we focus on five directions in which we believe major progress is still possible: getting the most out of deep learning, exploiting phase relationships across time-frequency bins, improving the estimation accuracy of multichannel parameters, addressing scenarios involving multiple microphone arrays or other sensors, and accelerating industry transfer. These five directions are covered in Sections 19.1, 19.2, 19.3, 19.4, and 19.5, respectively

    Combined single-microphone wiener and MVDR filtering based on speech interframe correlations and speech presence probability

    No full text
    For single-microphone noise reduction, a minimum variance distortionless response (MVDR) filter has been recently proposed based on speech correlations of consecutive time frames. This filter is able to keep speech distortion low but compared to conventional approaches achieves less noise reduction. Further, when only having access to the noisy speech, more artifacts in the background noise are audible due to estimation errors of the speech interframe correlations, especially in time-frequency regions where speech is not dominant. Therefore, in this paper we propose to apply the MVDR filter where speech is dominant and the singlechannel Wiener filter otherwise, using a weighting based on the speech presence probability. In addition, we modify the decision-directed approach to estimate the a priori SNR in a more robust way for short analysis frames. Experimental results show that the proposed scheme achieves a better speech quality compared to the MVDR filter and the single-channel Wiener filter
    corecore