Search CORE

13 research outputs found

DCCRN-KWS: an audio bias based model for noise robust small-footprint keyword spotting

Author: Lv Shubo
Ma Long
Sun Sining
Wang Xiong
Xie Lei
Publication venue
Publication date: 22/05/2023
Field of study

Real-world complex acoustic environments especially the ones with a low signal-to-noise ratio (SNR) will bring tremendous challenges to a keyword spotting (KWS) system. Inspired by the recent advances of neural speech enhancement and context bias in speech recognition, we propose a robust audio context bias based DCCRN-KWS model to address this challenge. We form the whole architecture as a multi-task learning framework for both denosing and keyword spotting, where the DCCRN encoder is connected with the KWS model. Helped with the denoising task, we further introduce an audio context bias module to leverage the real keyword samples and bias the network to better iscriminate keywords in noisy conditions. Feature merge and complex context linear modules are also introduced to strength such discrimination and to effectively leverage contextual information respectively. Experiments on the internal challenging dataset and the HIMIYA public dataset show that our DCCRN-KWS system is superior in performance, while ablation study demonstrates the good design of the whole model.Comment: Accepted by INTERSPEECH202

arXiv.org e-Print Archive

Uformer: A Unet based dilated complex & real dual-path conformer network for simultaneous speech enhancement and dereverberation

Author: Fu Yihui
Jv Yukai
Li Jingdong
Liu Yun
Luo Dawei
Lv Shubo
Xie Lei
Publication venue
Publication date: 04/05/2022
Field of study

Complex spectrum and magnitude are considered as two major features of speech enhancement and dereverberation. Traditional approaches always treat these two features separately, ignoring their underlying relationship. In this paper, we propose Uformer, a Unet based dilated complex & real dual-path conformer network in both complex and magnitude domain for simultaneous speech enhancement and dereverberation. We exploit time attention (TA) and dilated convolution (DC) to leverage local and global contextual information and frequency attention (FA) to model dimensional information. These three sub-modules contained in the proposed dilated complex & real dual-path conformer module effectively improve the speech enhancement and dereverberation performance. Furthermore, hybrid encoder and decoder are adopted to simultaneously model the complex spectrum and magnitude and promote the information interaction between two domains. Encoder decoder attention is also applied to enhance the interaction between encoder and decoder. Our experimental results outperform all SOTA time and complex domain models objectively and subjectively. Specifically, Uformer reaches 3.6032 DNSMOS on the blind test set of Interspeech 2021 DNS Challenge, which outperforms all top-performed models. We also carry out ablation experiments to tease apart all proposed sub-modules that are most important.Comment: Accepted by ICASSP 202

arXiv.org e-Print Archive

MBTFNet: Multi-Band Temporal-Frequency Neural Network For Singing Voice Enhancement

Author: Chen Zhouxuan
Han Runduo
Lv Shubo
Tan Zhili
Xie Lei
Xu Weiming
Zhao Weifeng
Zhou Wenjiang
Publication venue
Publication date: 06/10/2023
Field of study

A typical neural speech enhancement (SE) approach mainly handles speech and noise mixtures, which is not optimal for singing voice enhancement scenarios. Music source separation (MSS) models treat vocals and various accompaniment components equally, which may reduce performance compared to the model that only considers vocal enhancement. In this paper, we propose a novel multi-band temporal-frequency neural network (MBTFNet) for singing voice enhancement, which particularly removes background music, noise and even backing vocals from singing recordings. MBTFNet combines inter and intra-band modeling for better processing of full-band signals. Dual-path modeling are introduced to expand the receptive field of the model. We propose an implicit personalized enhancement (IPE) stage based on signal-to-noise ratio (SNR) estimation, which further improves the performance of MBTFNet. Experiments show that our proposed model significantly outperforms several state-of-the-art SE and MSS models

arXiv.org e-Print Archive

Two-stage Neural Network for ICASSP 2023 Speech Signal Improvement Challenge

Author: Chen Li
Han Runduo
Hao Xiang
Liu Mingshuai
Lv Shubo
Xia Xianjun
Xiao Yijian
Xie Lei
Zhang Zihan
Publication venue
Publication date: 14/03/2023
Field of study

In ICASSP 2023 speech signal improvement challenge, we developed a dual-stage neural model which improves speech signal quality induced by different distortions in a stage-wise divide-and-conquer fashion. Specifically, in the first stage, the speech improvement network focuses on recovering the missing components of the spectrum, while in the second stage, our model aims to further suppress noise, reverberation, and artifacts introduced by the first-stage model. Achieving 0.446 in the final score and 0.517 in the P.835 score, our system ranks 4th in the non-real-time track.Comment: Accepted by ICASSP 202

arXiv.org e-Print Archive

Preparation and Properties of sc-PLA/PMMA Transparent Nanofiber Air Filter

Author: Gaokai Zhang
Lei Shi
Shengnan Lv
Shubo Wang
Weimin Kang
Xin Zhao
Xupin Zhuang
Publication venue: 'MDPI AG'
Publication date: 01/09/2018
Field of study

Particulate matter (PM) pollution is a serious concern for the environment and public health. To protect indoor air quality, nanofiber filters have been used to coat window screens due to their high PM removal efficiency, transparency and low air resistance. However, these materials have poor mechanical property. In this study, electrostatic induction-assisted solution blowing was used to fabricate polylactide stereocomplex (sc-PLA), which served as reinforcement to enhance the physical cross-linking point to significantly restrict poly(methyl methacrylate) (PMMA) molecular chain motion and improve the mechanical properties of sc-PLA/PMMA nanofibers. Moreover, the introduction of sc-PLA led to the formation of thick/thin composite nanofiber structure, which is beneficial for the mechanical property. Thus, sc-PLA/PMMA air filters of ~83% transparency with 99.5% PM2.5 removal and 140% increase in mechanical properties were achieved when 5 wt % sc-PLA was added to PMMA. Hence, the addition of sc-PLA to transparent filters can effectively improve their performance

Multidisciplinary Digital Publishing Institute

Directory of Open Access Journals

Fabrication of fibrous microfiltration membrane by pore filling of nanofibers into poly(ethylene terephthalate) nonwoven scaffold

Author: Gaokao Zhang
Gregor EC
Shengnan Lv
Shubo Wang
Xianlin Xu
Xupin Zhuang
Yu W
Zongjie L
Publication venue: 'SAGE Publications'
Publication date
Field of study

Crossref