Search CORE

7 research outputs found

A perceptually-weighted deep neural network for monaural speech enhancement in various background noise conditions

Author: glorot
hu
liu
rothauser
rumelhart
tieleman
van den oord
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/08/2017
Field of study

Deep neural networks (DNN) have recently been shown to give state-of-the-art performance in monaural speech enhancement. However in the DNN training process, the perceptual difference between different components of the DNN output is not fully exploited, where equal importance is often assumed. To address this limitation, we have proposed a new perceptually-weighted objective function within a feedforward DNN framework, aiming to minimize the perceptual difference between the enhanced speech and the target speech. A perceptual weight is integrated into the proposed objective function, and has been tested on two types of output features: spectra and ideal ratio masks. Objective evaluations for both speech quality and speech intelligibility have been performed. Integration of our perceptual weight shows consistent improvement on several noise levels and a variety of different noise types

Crossref

University of Surrey

Surrey Research Insight

음성 지표 측정 모델을 이용한 음성 향상 심층신경망

Author: 김지환
Publication venue: 서울대학교 대학원
Publication date: 01/02/2019
Field of study

학위논문 (석사)-- 서울대학교 대학원 : 공과대학 전기·정보공학부, 2019. 2. 김남수.본 논문은 음성 지표 측정 모델을 이용한 심층신경망 음성 향상 기법을 다루고 있다. 기존의 심층신경망 음성 향상 기법은 목표 함수가 명료도 및 음질을 나타내는 지표와 관련성이 적기 때문에 한계성을 띠고 있었다. 이를 보완하기 위해 음성 향상 모델과 목표 함수가 음성 명료도 또는 음성 품질을 지표로 설정된 신경망 모델, 두 가지 모델을 연결하여 음성 향상을 시도하는 방향을 세웠다. 순수한 음성, 잡음이 섞인 음성, 향상된 음성 세 가지 경우에 대해 음성 지표를 측정한 뒤, 훈련을 통해 각각의 수치들을 측정하는 모델을 만들고, 이를 연결하여 음성 향상 모델을 훈련하는 것이다. 또한, 음성 향상 모델과 연결된 지표 측정 모델에서 출력되는 지표 값이 최대치가 되도록 훈련하는 과정에서 측정 모델의 뉴럴 네트워크 형태를 변화시키면서 최대치에 도달하는 속도 및 정확도를 향상하였다. 본 논문에서 음성 지표를 측정하는 데 사용된 지표는 STOI(short time objective intelligibility measure), PESQ(perceptual evaluation of speech quality) 두 가지이며, 이 두 가지 지표를 구하는 모델을 음성 향상 모델에 연결하는 방향으로 알고리즘을 구현한 뒤, 지표의 mean square error와 음성 feature의 mean square error 두 가지 값을 최소화하는 멀티 태스크 형식으로 훈련하였다. 모델을 검증한 결과 기존의 음성 향상 심층신경망에 비해 더 높은 지표값을 나타내는 것을 실험으로 확인하였다. 실험 결과에서는 PESQ 값과 STOI를 지표로 사용하였고, 기존 기법에서 사용하는 기저 행렬 보다 더 높은 성능을 보임을 확인하였다.This paper discusses in deep neural network speech enhancement techniques using subject quality measurement model. In conventional studies, there is an inconsistency between the model optimization criterion and the evaluation criterion on the enhanced speech. To compensate for the problem, we have established a direction to try to improve the enhancement efficiency by connecting two models: speech enhancement model and a neural network model with target functions as speech intelligibility or speech quality. To make this model, This model is trained by measuring subject qualities for three cases of clean speech, mixed speech and enhanced speech. In addition, in the course of training to maximize the quality value output from the subject quality measurement model associated with the speech enhancement model, by changing the shape of the measurement model's neutral network, the speed and accuracy at which the maximum is reached were improved. In this paper, there are two metrics used to measure subject qualities: short-time objective intelligibility measure (STOI), and perceptual evaluation of speech quality (PESQ), which have been trained and verified to show higher levels of speech enhancement algorithms in a multi-task format. The results of the experiment used PESQ values and STOI as indicators, and found that they performed better than the underlying model used by conventional techniques.Abstract (In Korean) 4 Contents List of Tables ii List of Figures iii 1 Introduction 1 2 Conventional Approaches for Speech Enhancement 4 2.1 Deep Neural Network-based Speech Enhancement . . . . . . . . . . 4 2.1.1 Deep Neural Network . . . . . . . . . . . . . . . . . . . . . 4 2.1.2 Deep Neural Network-based Speech Enhancement Network . 7 3 Subject Quality Measurement 10 3.1 STOI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 3.2 PESQ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 3.3 DNN-based Speech Enhancement using Subject Quality Measurement Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 3.3.1 Deep Neural Network-based Model . . . . . . . . . . . . . . 14 3.3.2 Convolutional Neural Network-based Model . . . . . . . . . 16 4 Proposed Enhancement Model 20 5 Experiment Design 22 5.1 Noisy Speech Mixtures . . . . . . . . . . . . . . . . . . . . . . . . . 22 5.2 SQM Feature Extraction . . . . . . . . . . . . . . . . . . . . . . . . 23 5.3 Neural Network Design . . . . . . . . . . . . . . . . . . . . . . . . . 24 6 Experimental Results 27 6.1 Subject Quality Measurement Models Performance . . . . . . . . . . 27 6.2 Speech Enhancement Models Performance using SQM Model as a Postfilter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 7 Conclusion and Future Work 34 Abstract iMaste

SNU Open Repository and Archive

A Perceptually-Weighted Deep Neural Network for Monaural Speech Enhancement in Various Background Noise Conditions

Author: Jackson Philip
Liu Qingju
Tang Yan
Wang Wenwu
Publication venue
Publication date: 02/09/2017
Field of study

University of Surrey