2,118 research outputs found
Continuous Modeling of the Denoising Process for Speech Enhancement Based on Deep Learning
In this paper, we explore a continuous modeling approach for
deep-learning-based speech enhancement, focusing on the denoising process. We
use a state variable to indicate the denoising process. The starting state is
noisy speech and the ending state is clean speech. The noise component in the
state variable decreases with the change of the state index until the noise
component is 0. During training, a UNet-like neural network learns to estimate
every state variable sampled from the continuous denoising process. In testing,
we introduce a controlling factor as an embedding, ranging from zero to one, to
the neural network, allowing us to control the level of noise reduction. This
approach enables controllable speech enhancement and is adaptable to various
application scenarios. Experimental results indicate that preserving a small
amount of noise in the clean target benefits speech enhancement, as evidenced
by improvements in both objective speech measures and automatic speech
recognition performance
Variance-Preserving-Based Interpolation Diffusion Models for Speech Enhancement
The goal of this study is to implement diffusion models for speech
enhancement (SE). The first step is to emphasize the theoretical foundation of
variance-preserving (VP)-based interpolation diffusion under continuous
conditions. Subsequently, we present a more concise framework that encapsulates
both the VP- and variance-exploding (VE)-based interpolation diffusion methods.
We demonstrate that these two methods are special cases of the proposed
framework. Additionally, we provide a practical example of VP-based
interpolation diffusion for the SE task. To improve performance and ease model
training, we analyze the common difficulties encountered in diffusion models
and suggest amenable hyper-parameters. Finally, we evaluate our model against
several methods using a public benchmark to showcase the effectiveness of our
approac
Nickel hydroxide/chemical vapor deposition-grown graphene/nickel hydroxide/nickel foam hybrid electrode for high performance supercapacitors
Rational design of electrode structures has been recognized as an effective strategy to improve the electrochemical performance of electrode materials. Herein, we demonstrate an integrated electrode in which nickel hydroxide (Ni(OH)2) nanosheets are deposited on both sides of chemical vapor deposition-grown graphene on Ni foam, which not only effectively optimizes electrical conductivity of Ni(OH)2, but also accommodates the structural deformation assciated with the large volume change upon cycling. The synthesized Ni(OH)2/graphene/Ni(OH)2/Ni foam electrode exhibits a high specific capacity of 991 C g−1 at a current density of 1 A g−1, which is higher than the theoretical specific capacity of additive sum of Ni(OH)2 and graphene, and retains 95.4% of the initial capacity after 5000 cycles. A hybrid supercapacitor is constructed by using Ni(OH)2/graphene/Ni(OH)2/Ni foam as the positive electrode and activated carbon on Ni foam as the negative electrode, which achieves a maximum energy density of 49.5 W h kg−1 at a power density of 750 W kg−1, and excellent cycling lifespans with 89.3% retention after 10000 cycles at 10 A g−1
Characterizing Speech Adversarial Examples Using Self-Attention U-Net Enhancement
Recent studies have highlighted adversarial examples as ubiquitous threats to
the deep neural network (DNN) based speech recognition systems. In this work,
we present a U-Net based attention model, U-Net, to enhance adversarial
speech signals. Specifically, we evaluate the model performance by
interpretable speech recognition metrics and discuss the model performance by
the augmented adversarial training. Our experiments show that our proposed
U-Net improves the perceptual evaluation of speech quality (PESQ) from
1.13 to 2.78, speech transmission index (STI) from 0.65 to 0.75, short-term
objective intelligibility (STOI) from 0.83 to 0.96 on the task of speech
enhancement with adversarial speech examples. We conduct experiments on the
automatic speech recognition (ASR) task with adversarial audio attacks. We find
that (i) temporal features learned by the attention network are capable of
enhancing the robustness of DNN based ASR models; (ii) the generalization power
of DNN based ASR model could be enhanced by applying adversarial training with
an additive adversarial data augmentation. The ASR metric on word-error-rates
(WERs) shows that there is an absolute 2.22 decrease under gradient-based
perturbation, and an absolute 2.03 decrease, under evolutionary-optimized
perturbation, which suggests that our enhancement models with adversarial
training can further secure a resilient ASR system.Comment: The first draft was finished in August 2019. Accepted to IEEE ICASSP
202
- …