13 research outputs found

    DCCRN-KWS: an audio bias based model for noise robust small-footprint keyword spotting

    Full text link
    Real-world complex acoustic environments especially the ones with a low signal-to-noise ratio (SNR) will bring tremendous challenges to a keyword spotting (KWS) system. Inspired by the recent advances of neural speech enhancement and context bias in speech recognition, we propose a robust audio context bias based DCCRN-KWS model to address this challenge. We form the whole architecture as a multi-task learning framework for both denosing and keyword spotting, where the DCCRN encoder is connected with the KWS model. Helped with the denoising task, we further introduce an audio context bias module to leverage the real keyword samples and bias the network to better iscriminate keywords in noisy conditions. Feature merge and complex context linear modules are also introduced to strength such discrimination and to effectively leverage contextual information respectively. Experiments on the internal challenging dataset and the HIMIYA public dataset show that our DCCRN-KWS system is superior in performance, while ablation study demonstrates the good design of the whole model.Comment: Accepted by INTERSPEECH202

    Uformer: A Unet based dilated complex & real dual-path conformer network for simultaneous speech enhancement and dereverberation

    Full text link
    Complex spectrum and magnitude are considered as two major features of speech enhancement and dereverberation. Traditional approaches always treat these two features separately, ignoring their underlying relationship. In this paper, we propose Uformer, a Unet based dilated complex & real dual-path conformer network in both complex and magnitude domain for simultaneous speech enhancement and dereverberation. We exploit time attention (TA) and dilated convolution (DC) to leverage local and global contextual information and frequency attention (FA) to model dimensional information. These three sub-modules contained in the proposed dilated complex & real dual-path conformer module effectively improve the speech enhancement and dereverberation performance. Furthermore, hybrid encoder and decoder are adopted to simultaneously model the complex spectrum and magnitude and promote the information interaction between two domains. Encoder decoder attention is also applied to enhance the interaction between encoder and decoder. Our experimental results outperform all SOTA time and complex domain models objectively and subjectively. Specifically, Uformer reaches 3.6032 DNSMOS on the blind test set of Interspeech 2021 DNS Challenge, which outperforms all top-performed models. We also carry out ablation experiments to tease apart all proposed sub-modules that are most important.Comment: Accepted by ICASSP 202

    MBTFNet: Multi-Band Temporal-Frequency Neural Network For Singing Voice Enhancement

    Full text link
    A typical neural speech enhancement (SE) approach mainly handles speech and noise mixtures, which is not optimal for singing voice enhancement scenarios. Music source separation (MSS) models treat vocals and various accompaniment components equally, which may reduce performance compared to the model that only considers vocal enhancement. In this paper, we propose a novel multi-band temporal-frequency neural network (MBTFNet) for singing voice enhancement, which particularly removes background music, noise and even backing vocals from singing recordings. MBTFNet combines inter and intra-band modeling for better processing of full-band signals. Dual-path modeling are introduced to expand the receptive field of the model. We propose an implicit personalized enhancement (IPE) stage based on signal-to-noise ratio (SNR) estimation, which further improves the performance of MBTFNet. Experiments show that our proposed model significantly outperforms several state-of-the-art SE and MSS models

    Two-stage Neural Network for ICASSP 2023 Speech Signal Improvement Challenge

    Full text link
    In ICASSP 2023 speech signal improvement challenge, we developed a dual-stage neural model which improves speech signal quality induced by different distortions in a stage-wise divide-and-conquer fashion. Specifically, in the first stage, the speech improvement network focuses on recovering the missing components of the spectrum, while in the second stage, our model aims to further suppress noise, reverberation, and artifacts introduced by the first-stage model. Achieving 0.446 in the final score and 0.517 in the P.835 score, our system ranks 4th in the non-real-time track.Comment: Accepted by ICASSP 202

    Preparation and Properties of sc-PLA/PMMA Transparent Nanofiber Air Filter

    No full text
    Particulate matter (PM) pollution is a serious concern for the environment and public health. To protect indoor air quality, nanofiber filters have been used to coat window screens due to their high PM removal efficiency, transparency and low air resistance. However, these materials have poor mechanical property. In this study, electrostatic induction-assisted solution blowing was used to fabricate polylactide stereocomplex (sc-PLA), which served as reinforcement to enhance the physical cross-linking point to significantly restrict poly(methyl methacrylate) (PMMA) molecular chain motion and improve the mechanical properties of sc-PLA/PMMA nanofibers. Moreover, the introduction of sc-PLA led to the formation of thick/thin composite nanofiber structure, which is beneficial for the mechanical property. Thus, sc-PLA/PMMA air filters of ~83% transparency with 99.5% PM2.5 removal and 140% increase in mechanical properties were achieved when 5 wt % sc-PLA was added to PMMA. Hence, the addition of sc-PLA to transparent filters can effectively improve their performance
    corecore