Search CORE

3 research outputs found

Learning with Learned Loss Function: Speech Enhancement with Quality-Net to Improve Perceptual Evaluation of Speech Quality

Author: Fu Szu-Wei
Liao Chien-Feng
Tsao Yu
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 14/11/2019
Field of study

Utilizing a human-perception-related objective function to train a speech enhancement model has become a popular topic recently. The main reason is that the conventional mean squared error (MSE) loss cannot represent auditory perception well. One of the typical hu-man-perception-related metrics, which is the perceptual evaluation of speech quality (PESQ), has been proven to provide a high correlation to the quality scores rated by humans. Owing to its complex and non-differentiable properties, however, the PESQ function may not be used to optimize speech enhancement models directly. In this study, we propose optimizing the enhancement model with an approximated PESQ function, which is differentiable and learned from the training data. The experimental results show that the learned surrogate function can guide the enhancement model to further boost the PESQ score (in-crease of 0.18 points compared to the results trained with MSE loss) and maintain the speech intelligibility.Comment: Accepted by IEEE Signal Processing Letters (SPL

arXiv.org e-Print Archive

Boosting Objective Scores of a Speech Enhancement Model by MetricGAN Post-processing

Author: Chuang Shang-Yi
Fu Szu-Wei
Hsieh Tsun-An
Hung Kuo-Hsuan
Kuo Heng-Cheng
Li You-Jin
Liao Chien-Feng
Lu Yen-Ju
Tsao Yu
Wang Syu-Siang
Yu Cheng
Zezario Ryandhimas E.
Publication venue
Publication date: 03/03/2021
Field of study

The Transformer architecture has demonstrated a superior ability compared to recurrent neural networks in many different natural language processing applications. Therefore, our study applies a modified Transformer in a speech enhancement task. Specifically, positional encoding in the Transformer may not be necessary for speech enhancement, and hence, it is replaced by convolutional layers. To further improve the perceptual evaluation of the speech quality (PESQ) scores of enhanced speech, the L_1 pre-trained Transformer is fine-tuned using a MetricGAN framework. The proposed MetricGAN can be treated as a general post-processing module to further boost the objective scores of interest. The experiments were conducted using the data sets provided by the organizer of the Deep Noise Suppression (DNS) challenge. Experimental results demonstrated that the proposed system outperformed the challenge baseline, in both subjective and objective evaluations, with a large margin.Comment: Accepted by APSIPA 202

arXiv.org e-Print Archive

Jahresbericht 2018 Institut für Nachrichtentechnik (IfN), Technische Universität Braunschweig

Author
Publication venue: Shaker
Publication date: 01/01/2018
Field of study

Digitale Bibliothek Braunschweig