15 research outputs found
Two-stage Neural Network for ICASSP 2023 Speech Signal Improvement Challenge
In ICASSP 2023 speech signal improvement challenge, we developed a dual-stage
neural model which improves speech signal quality induced by different
distortions in a stage-wise divide-and-conquer fashion. Specifically, in the
first stage, the speech improvement network focuses on recovering the missing
components of the spectrum, while in the second stage, our model aims to
further suppress noise, reverberation, and artifacts introduced by the
first-stage model. Achieving 0.446 in the final score and 0.517 in the P.835
score, our system ranks 4th in the non-real-time track.Comment: Accepted by ICASSP 202
Dynamic nsNet2: Efficient Deep Noise Suppression with Early Exiting
Although deep learning has made strides in the field of deep noise
suppression, leveraging deep architectures on resource-constrained devices
still proved challenging. Therefore, we present an early-exiting model based on
nsNet2 that provides several levels of accuracy and resource savings by halting
computations at different stages. Moreover, we adapt the original architecture
by splitting the information flow to take into account the injected dynamism.
We show the trade-offs between performance and computational complexity based
on established metrics.Comment: Accepted at the MLSP 202
iMetricGAN: Intelligibility Enhancement for Speech-in-Noise using Generative Adversarial Network-based Metric Learning
The intelligibility of natural speech is seriously degraded when exposed to
adverse noisy environments. In this work, we propose a deep learning-based
speech modification method to compensate for the intelligibility loss, with the
constraint that the root mean square (RMS) level and duration of the speech
signal are maintained before and after modifications. Specifically, we utilize
an iMetricGAN approach to optimize the speech intelligibility metrics with
generative adversarial networks (GANs). Experimental results show that the
proposed iMetricGAN outperforms conventional state-of-the-art algorithms in
terms of objective measures, i.e., speech intelligibility in bits (SIIB) and
extended short-time objective intelligibility (ESTOI), under a Cafeteria noise
condition. In addition, formal listening tests reveal significant
intelligibility gains when both noise and reverberation exist.Comment: 5 pages, Submitted to INTERSPEECH 202