72 research outputs found

    DCCRN-KWS: an audio bias based model for noise robust small-footprint keyword spotting

    Full text link
    Real-world complex acoustic environments especially the ones with a low signal-to-noise ratio (SNR) will bring tremendous challenges to a keyword spotting (KWS) system. Inspired by the recent advances of neural speech enhancement and context bias in speech recognition, we propose a robust audio context bias based DCCRN-KWS model to address this challenge. We form the whole architecture as a multi-task learning framework for both denosing and keyword spotting, where the DCCRN encoder is connected with the KWS model. Helped with the denoising task, we further introduce an audio context bias module to leverage the real keyword samples and bias the network to better iscriminate keywords in noisy conditions. Feature merge and complex context linear modules are also introduced to strength such discrimination and to effectively leverage contextual information respectively. Experiments on the internal challenging dataset and the HIMIYA public dataset show that our DCCRN-KWS system is superior in performance, while ablation study demonstrates the good design of the whole model.Comment: Accepted by INTERSPEECH202

    Key Frame Mechanism For Efficient Conformer Based End-to-end Speech Recognition

    Full text link
    Recently, Conformer as a backbone network for end-to-end automatic speech recognition achieved state-of-the-art performance. The Conformer block leverages a self-attention mechanism to capture global information, along with a convolutional neural network to capture local information, resulting in improved performance. However, the Conformer-based model encounters an issue with the self-attention mechanism, as computational complexity grows quadratically with the length of the input sequence. Inspired by previous Connectionist Temporal Classification (CTC) guided blank skipping during decoding, we introduce intermediate CTC outputs as guidance into the downsampling procedure of the Conformer encoder. We define the frame with non-blank output as key frame. Specifically, we introduce the key frame-based self-attention (KFSA) mechanism, a novel method to reduce the computation of the self-attention mechanism using key frames. The structure of our proposed approach comprises two encoders. Following the initial encoder, we introduce an intermediate CTC loss function to compute the label frame, enabling us to extract the key frames and blank frames for KFSA. Furthermore, we introduce the key frame-based downsampling (KFDS) mechanism to operate on high-dimensional acoustic features directly and drop the frames corresponding to blank labels, which results in new acoustic feature sequences as input to the second encoder. By using the proposed method, which achieves comparable or higher performance than vanilla Conformer and other similar work such as Efficient Conformer. Meantime, our proposed method can discard more than 60\% useless frames during model training and inference, which will accelerate the inference speed significantly. This work code is available in {https://github.com/scufan1990/Key-Frame-Mechanism-For-Efficient-Conformer}Comment: This manuscript has been accepted by IEEE Signal Processing Letters for publicatio

    Investigating Generative Adversarial Networks based Speech Dereverberation for Robust Speech Recognition

    Full text link
    We investigate the use of generative adversarial networks (GANs) in speech dereverberation for robust speech recognition. GANs have been recently studied for speech enhancement to remove additive noises, but there still lacks of a work to examine their ability in speech dereverberation and the advantages of using GANs have not been fully established. In this paper, we provide deep investigations in the use of GAN-based dereverberation front-end in ASR. First, we study the effectiveness of different dereverberation networks (the generator in GAN) and find that LSTM leads a significant improvement as compared with feed-forward DNN and CNN in our dataset. Second, further adding residual connections in the deep LSTMs can boost the performance as well. Finally, we find that, for the success of GAN, it is important to update the generator and the discriminator using the same mini-batch data during training. Moreover, using reverberant spectrogram as a condition to discriminator, as suggested in previous studies, may degrade the performance. In summary, our GAN-based dereverberation front-end achieves 14%-19% relative CER reduction as compared to the baseline DNN dereverberation network when tested on a strong multi-condition training acoustic model.Comment: Interspeech 201

    Exacerbated climate risks induced by precipitation extremes in the Yangtze River basin under warming scenarios

    Get PDF
    The Yangtze River basin is a typical region of the world that has a well-developed economy but is also greatly affected by multiple climate extremes. An improved understanding of future climate trends and associated exposures in this region is urgent needed to address socioeconomic risks. This research aims to quantify historical and future projected population exposure to precipitation extremes in the Yangtze basin using meteorological records and downscaled climate models. The study found that the hazard zone for precipitation extremes during baseline period was primarily located in the mid-lower Yangtze basin, particularly around the Poyang Lake watershed. Climate projections for 2050 indicate a further increase in the occurrence of precipitation extremes in this hazard zone, while a decrease in extreme events is detectable in the upper Yangtze basin under higher radiative forcing. Future socioeconomic scenarios suggest a tendency for population growth and migration towards the lower Yangtze basin, resulting in aggravated climate risks in megacities. Multi-model simulations indicate that population exposure to precipitation extremes in the lower Yangtze basin will increase by 9–22% around 2050, with both climate and population factors contributing positively. Shanghai, Changsha, Hangzhou, Ganzhou, and Huanggang are identified as hotspot cities facing the highest foreseeable risks of precipitation extremes in the Yangtze basin

    A Novel Model of Atherosclerosis in Rabbits Using Injury to Arterial Walls Induced by Ferric Chloride as Evaluated by Optical Coherence Tomography as well as Intravascular Ultrasound and Histology

    Get PDF
    This study aim was to develop a new model of atherosclerosis by FeCl3-induced injury to right common carotid arteries (CCAs) of rabbits. Right CCAs were induced in male New Zealand White rabbits (n = 15) by combination of a cholesterol-rich diet and FeCl3-induced injury to arterial walls. The right and left CCAs were evaluated by histology and in vivo intravascular ultrasound (IVUS) and optical coherence tomography (OCT) examinations of 24 hours (n = 3), 8 weeks (n = 6), and 12 weeks (n = 6) after injury. Each right CCA of the rabbits showed extensive white-yellow plaques. At eight and 12 weeks after injury, IVUS, OCT, and histological findings demonstrated that the right CCAs had evident eccentric plaques. Six plaques (50%) with evident positive remodeling were observed. Marked progression was clearly observed in the same plaque at 12 weeks after injury when it underwent repeat OCT and IVUS. We demonstrated, for the first time, a novel model of atherosclerosis induced by FeCl3. The model is simple, fast, inexpensive, and reproducible and has a high success rate. The eccentric plaques and remodeling of plaques were common in this model. We successfully carried out IVUS and OCT examinations twice in the same lesion within a relatively long period of time

    Genome-wide identification and expression analysis of the MYB transcription factor in moso bamboo (Phyllostachys edulis)

    Get PDF
    The MYB family, one of the largest transcription factor (TF) families in the plant kingdom, plays vital roles in cell formation, morphogenesis and signal transduction, as well as responses to biotic and abiotic stresses. However, the underlying function of bamboo MYB TFs remains unclear. To gain insight into the status of these proteins, a total of 85 PeMYBs, which were further divided into 11 subgroups, were identified in moso bamboo (Phyllostachys edulis) by using a genome-wide search strategy. Gene structure analysis showed that PeMYBs were significantly different, with exon numbers varying from 4 to 13. Phylogenetic analysis indicated that PeMYBs clustered into 27 clades, of which the function of 18 clades has been predicted. In addition, almost all of the PeMYBs were differently expressed in leaves, panicles, rhizomes and shoots based on RNA-seq data. Furthermore, qRT-PCR analysis showed that 12 PeMYBs related to the biosynthesis and deposition of the secondary cell wall (SCW) were constitutively expressed, and their transcript abundance levels have changed significantly with increasing height of the bamboo shoots, for which the degree of lignification continuously increased. This result indicated that these PeMYBs might play fundamental roles in SCW thickening and bamboo shoot lignification. The present comprehensive and systematic study on the members of the MYB family provided a reference and solid foundation for further functional analysis of MYB TFs in moso bamboo
    corecore