7 research outputs found

    Towards More Efficient DNN-Based Speech Enhancement Using Quantized Correlation Mask

    Get PDF
    Many studies on deep learning-based speech enhancement (SE) utilizing the computational auditory scene analysis method typically employs the ideal binary mask or the ideal ratio mask to reconstruct the enhanced speech signal. However, many SE applications in real scenarios demand a desirable balance between denoising capability and computational cost. In this study, first, an improvement over the ideal ratio mask to attain more superior SE performance is proposed through introducing an efficient adaptive correlation-based factor for adjusting the ratio mask. The proposed method exploits the correlation coefficients among the noisy speech, noise and clean speech to effectively re-distribute the power ratio of the speech and noise during the ratio mask construction phase. Second, to make the supervised SE system more computationally-efficient, quantization techniques are considered to reduce the number of bits needed to represent floating numbers, leading to a more compact SE model. The proposed quantized correlation mask is utilized in conjunction with a 4-layer deep neural network (DNN-QCM) comprising dropout regulation, pre-training and noise-aware training to derive a robust and high-order mapping in enhancement, and to improve generalization capability in unseen conditions. Results show that the quantized correlation mask outperforms the conventional ratio mask representation and the other SE algorithms used for comparison. When compared to a DNN with ideal ratio mask as its learning targets, the DNN-QCM provided an improvement of approximately 6.5% in the short-time objective intelligibility score and 11.0% in the perceptual evaluation of speech quality score. The introduction of the quantization method can reduce the neural network weights to a 5-bit representation from a 32-bit, while effectively suppressing stationary and non-stationary noise. Timing analyses also show that with the techniques incorporated in the proposed DNN-QCM system to increase its compac..

    Deep neural network techniques for monaural speech enhancement: state of the art analysis

    Full text link
    Deep neural networks (DNN) techniques have become pervasive in domains such as natural language processing and computer vision. They have achieved great success in these domains in task such as machine translation and image generation. Due to their success, these data driven techniques have been applied in audio domain. More specifically, DNN models have been applied in speech enhancement domain to achieve denosing, dereverberation and multi-speaker separation in monaural speech enhancement. In this paper, we review some dominant DNN techniques being employed to achieve speech separation. The review looks at the whole pipeline of speech enhancement from feature extraction, how DNN based tools are modelling both global and local features of speech and model training (supervised and unsupervised). We also review the use of speech-enhancement pre-trained models to boost speech enhancement process. The review is geared towards covering the dominant trends with regards to DNN application in speech enhancement in speech obtained via a single speaker.Comment: conferenc
    corecore