Search CORE

21 research outputs found

잡음 혹은 압축된 음성을 위한 생산적 적대 신경망을 활용한 다중 해상도 음성 향상

Author: 김형용
Publication venue: 서울대학교 대학원
Publication date: 01/08/2021
Field of study

학위논문(박사) -- 서울대학교대학원 : 공과대학 전기·정보공학부, 2021.8. 김남수.Enhancement techniques for noisy speech and speech coding are essential for various speech applications such as robust speech recognition, hearing aids, and mobile communications. The main objective of enhancement techniques is to improve the quality and intelligibility of noisy speech by suppressing the background noise or the degraded speech by lowrate speech coding. Recently, a generative model-based data modelling showed prominent results in the speech processing area. From this perspective, we propose generative model-based enhancement techniques based on a multi-resolution approach for noisy speech and speech coding. Generative adversarial networks (GANs) have been successfully applied to speech enhancement. However, there still remain two issues that need to be addressed: (1) GAN-based training is typically unstable due to its non-convex property, and (2) most of the conventional methods do not fully take advantage of the speech characteristics, which could result in a sub-optimal solution. In order to deal with these problems, we propose a progressive generator that can handle speech in a multi-resolution fashion. Additionally, we propose a multi-scale discriminator that discriminates the real and generated speech at various sampling rates to stabilize GAN training. Experimental results showed that the proposed approach could make the training faster and more stable, which improves the performance on various metrics for speech enhancement. Recently, speech synthesis based on generative models has been successfully applied to the speech codec area. Despite their notable improvements in the speech quality, conventional neural decoder typically requires the prior information of the original speech codec such as bit allocation or de-quantization methods, which is not a general solution for various types of codecs. To address this limitation, we propose an imitation neural decoder based on a generative model which can directly reconstruct the speech from the bitstream without any speech codec information. Additionally, we propose a de-quantization network that can find which bits are related and de-quantize the bitstreams to extract a conditional variable which helps the generative model restore the original speech. Through a number of experiments with mixed excitation linear prediction (MELP), Advanced multi-band excitation (AMBE), and SPEEX at 2.4 kb/s, it is verified that the proposed method shows better subjective and objective results than the original speech codecs. An integrated model was proposed by applying the progressive approach of Chapter 2 to the neurally optimized decoder proposed in Chapter 3. Since parallel wavenet, a generator of parallel wavegan, in Chapter 3, requires a lot of GPU usage for training, it takes a lot of time as small batches and training. In order to solve this problem, parallel wavenet is transformed into a progressive structure. Experimental results showed that the proposed model got better objective results compare to that of the parallel WaveNet.잡음이 섞여있는 음성 및 음성 코딩을 위한 향상 기술은 성능좋은 음성 인식, 보청기 및 이동 통신과 같은 다양한 음성 응용 프로그램에서 필수적입니다. 이러한 음성 향상 기술의 주요 목적은 음성의 품질과 명료성을 향상시키는 것입니다. 최근 생성 모델 기반 데이터 모델링은 음성 신호 처리 영역에서 성공적인 결과를 보여주었습니다. 이러한 관점에서 본 논문에서는 잡음이 많은 음성 및 음성 코딩을 위한 다중 해상도를 활용한 생산적 적대 모델 기반 향상 기술을 제안하였습니다. 최근 GAN (Generative Adversarial Network)은 음성 향상에 성공적으로 적용되었습니다. 그러나 이러한 GAN 기반의 향상에서는 크게 2가지 문제가 발생 하고 있는데, (1) GAN 기반 학습은 일반적으로 non-convex 특성으로 인해 불안정하며 (2) 대부분의 기존 방법들은 음성 특성을 잘 활용하지 못하고 있습니다. 이러한 문제를 해결하기 위해 우리는 다중 해상도 방식으로 음성을 처리 할 수 있는 점진적 생성기를 제안하였습니다. 또한 GAN 훈련을 안정화하기 시키기 위해 다양한 음성의 샘플링 속도에서 실제 및 생성 된 음성을 구별하는 다중 스케일 판별기를 제안하였습니다. 실험 결과는 제안된 접근 방식이 훈련을 더 빠르고 안정적으로 만들 수 있음을 보여 주어 음성 향상의 성능 측정 방법에서 높은 성능을 확인 하였습니다. 최근에 생성 모델을 기반으로 한 음성 합성이 음성 코덱 영역에 성공적으로 적용되고 있습니다. 음성 품질의 눈에 띄는 개선에도 불구하고, 기존의 신경 디코더는 일반적으로 비트 할당 정보 또는 역 양자화 방법과 같은 음성 코덱의 사전 정보를 필요로 하는데, 이는 다양한 종류의 코덱에 대한 일반화된 해결 방법이 아닙니다. 이러한 한계를 해결하기 위해 우리는 음성 코덱의 사전 정보 없이 비트 스트림에서 음성을 직접 재구성 할 수 있는 생성 모델을 기반 모방 신경 디코더를 제안하였습니다. 또한, 생성 모델이 원래 음성을 복원하는 데 도움이 되는 조건부 변수를 추출하기 위해 어떤 비트가 관련되어 있는지 찾아내고 비트 스트림을 역 양자화 할 수 있는 역 양자화 네트워크를 제안 하였습니다. MELP, AMBE 및 2.4 kb / s의 SPEEX에 대한 여러 실험을 통해 제안 된 방법이 원래 음성 코덱보다 더 나은 주관적, 객관적 측정 결과를 보여주는 것으로 확인하였습니다. 본 논문에서 제안한 2 장의 점진적 접근 방식을 3 장에서 제안한 모방 신경 보코더에 적용하여 통합 모델을 제안 하였습니다. 3 장의 병렬 웨이브 넷은 훈련을 위해 많은 GPU 사용이 필요하므로, 작은 배치로 인해 학습을 하는데 많은 시간이 걸립니다. 이 문제를 해결하기 위해 병렬 웨이브 넷을 점진적 구조로 변환하였습니다. 실험 결과, 제안 된 모델이 병렬 WaveNet과 비교하여 더 나은 객관적인 결과를 얻었습니다.1 Introduction 1 1.1 Speech Enhancement 1 1.2 Speech Coding 3 1.3 Outline of Thesis 4 2 A Multi-Resolution Approach to GAN-Based Speech Enhancement 7 2.1 Introduction 7 2.2 GAN-based Speech Enhancement 10 2.3 Multi-resolution Approach for Speech Enhancement 15 2.3.1 Progressive Generator 18 2.3.2 Multi-scale Discriminator 19 2.4 Experimental Settings and Results 21 2.4.1 Dataset 21 2.4.2 Network Structure 21 2.4.3 Evaluation Methods 24 2.4.4 Experiments and Results 25 2.4.5 Performance of Multi-scale Discriminator 26 2.4.6 Analysis and Comparison of Spectorgrams 28 2.4.7 Fast and Stable Training of Proposed Model 30 2.4.8 Comparison with Conventional GAN-based Speech Enhancement Techniques 33 2.5 Summary 34 3 Neurally opimized decoder for low bitrate speech codec 37 3.1 Introduction 37 3.2 Speech Coding Overview 40 3.3 Neurally Optimized Decoder 42 3.4 Experimental Settings and Results 46 3.4.1 Database of Speech and Codecs 46 3.4.2 Experimental Setup 46 3.4.3 Analysis of Training Loss 48 3.4.4 Objective Test 51 3.4.5 Subjective Test 53 3.4.6 Speaker Transparency 53 3.5 Summary 55 4 Imitation neural decoder based on progressive approach 57 4.1 Introduction 57 4.2 Parallel WaveNet 58 4.3 Progressive WaveNet 60 4.4 Experiments and Results 61 4.4.1 Objective Measures 62 4.4.2 Analysis of Memory Usage and Inference Speed 62 4.5 Summary 63 5 Conclusions 65 Bibliography 67박

SNU Open Repository and Archive

Social service provision system: the issues of public-private partnership in UK, US and Korea

Author: Kang Hyekyu
Kim Hyoung Yong
Park Se-Kyung
강혜규
김형용
박세경
Publication venue: 'The Research Institute of Nursing Science, Seoul National University (KAMJE)'
Publication date: 01/01/2007
Field of study