2,118 research outputs found

    Continuous Modeling of the Denoising Process for Speech Enhancement Based on Deep Learning

    Full text link
    In this paper, we explore a continuous modeling approach for deep-learning-based speech enhancement, focusing on the denoising process. We use a state variable to indicate the denoising process. The starting state is noisy speech and the ending state is clean speech. The noise component in the state variable decreases with the change of the state index until the noise component is 0. During training, a UNet-like neural network learns to estimate every state variable sampled from the continuous denoising process. In testing, we introduce a controlling factor as an embedding, ranging from zero to one, to the neural network, allowing us to control the level of noise reduction. This approach enables controllable speech enhancement and is adaptable to various application scenarios. Experimental results indicate that preserving a small amount of noise in the clean target benefits speech enhancement, as evidenced by improvements in both objective speech measures and automatic speech recognition performance

    Variance-Preserving-Based Interpolation Diffusion Models for Speech Enhancement

    Full text link
    The goal of this study is to implement diffusion models for speech enhancement (SE). The first step is to emphasize the theoretical foundation of variance-preserving (VP)-based interpolation diffusion under continuous conditions. Subsequently, we present a more concise framework that encapsulates both the VP- and variance-exploding (VE)-based interpolation diffusion methods. We demonstrate that these two methods are special cases of the proposed framework. Additionally, we provide a practical example of VP-based interpolation diffusion for the SE task. To improve performance and ease model training, we analyze the common difficulties encountered in diffusion models and suggest amenable hyper-parameters. Finally, we evaluate our model against several methods using a public benchmark to showcase the effectiveness of our approac

    Nickel hydroxide/chemical vapor deposition-grown graphene/nickel hydroxide/nickel foam hybrid electrode for high performance supercapacitors

    Get PDF
    Rational design of electrode structures has been recognized as an effective strategy to improve the electrochemical performance of electrode materials. Herein, we demonstrate an integrated electrode in which nickel hydroxide (Ni(OH)2) nanosheets are deposited on both sides of chemical vapor deposition-grown graphene on Ni foam, which not only effectively optimizes electrical conductivity of Ni(OH)2, but also accommodates the structural deformation assciated with the large volume change upon cycling. The synthesized Ni(OH)2/graphene/Ni(OH)2/Ni foam electrode exhibits a high specific capacity of 991 C g−1 at a current density of 1 A g−1, which is higher than the theoretical specific capacity of additive sum of Ni(OH)2 and graphene, and retains 95.4% of the initial capacity after 5000 cycles. A hybrid supercapacitor is constructed by using Ni(OH)2/graphene/Ni(OH)2/Ni foam as the positive electrode and activated carbon on Ni foam as the negative electrode, which achieves a maximum energy density of 49.5 W h kg−1 at a power density of 750 W kg−1, and excellent cycling lifespans with 89.3% retention after 10000 cycles at 10 A g−1

    Characterizing Speech Adversarial Examples Using Self-Attention U-Net Enhancement

    Full text link
    Recent studies have highlighted adversarial examples as ubiquitous threats to the deep neural network (DNN) based speech recognition systems. In this work, we present a U-Net based attention model, U-NetAt_{At}, to enhance adversarial speech signals. Specifically, we evaluate the model performance by interpretable speech recognition metrics and discuss the model performance by the augmented adversarial training. Our experiments show that our proposed U-NetAt_{At} improves the perceptual evaluation of speech quality (PESQ) from 1.13 to 2.78, speech transmission index (STI) from 0.65 to 0.75, short-term objective intelligibility (STOI) from 0.83 to 0.96 on the task of speech enhancement with adversarial speech examples. We conduct experiments on the automatic speech recognition (ASR) task with adversarial audio attacks. We find that (i) temporal features learned by the attention network are capable of enhancing the robustness of DNN based ASR models; (ii) the generalization power of DNN based ASR model could be enhanced by applying adversarial training with an additive adversarial data augmentation. The ASR metric on word-error-rates (WERs) shows that there is an absolute 2.22 %\% decrease under gradient-based perturbation, and an absolute 2.03 %\% decrease, under evolutionary-optimized perturbation, which suggests that our enhancement models with adversarial training can further secure a resilient ASR system.Comment: The first draft was finished in August 2019. Accepted to IEEE ICASSP 202
    • …
    corecore