Search CORE

5 research outputs found

Performance Enhancement Techniques for Sound Event Classification in Reverberant Environment

Author: 이재준
Publication venue: 서울대학교 대학원
Publication date: 01/08/2019
Field of study

학위논문(석사)--서울대학교 대학원 :융합과학기술대학원 융합과학부(디지털정보융합전공),2019. 8. 이교구.본 연구에서는 잔향 환경에서의 사운드 이벤트 분류시 성능을 개선하는 기법을 제안한다. 사운드 이벤트 분류는 교통 상황, 방범 상황 감지 시스템 등 다양한 응용분야에 활발하게 적용되고 있고 응용분야의 특성상 실제 환경의 잡음과 잔향에 강인한 성능을 갖는 것이 중요한 문제이다. 하지만 이런 잡음과 잔향 환경에서의 사운드 이벤트 분류 성능 저하에 대한 연구는 저조하며 특히 잔향 환경에서의 사운드 이벤트 분류 연구는 전무한 실정이다. 따라서 본 연구에서는 잔향 환경에서 사운드 이벤트 분류 성능이 저하되는 것을 관찰하고 이를 해결하기 위한 개선 기법을 제안한다. 먼저, 잔향 환경을 모델링 하기 위해 원본 데이터셋을 잔향이 존재하는 실제 환경에서 재녹음한 녹음 테스트셋과 공간 임펄스 응답 데이터셋을 이용하여 합성한 합성 테스트셋을 제작하였고, 이를 이용하여 잔향 환경에서 사운드 이벤트 분류 성능이 저하됨을 관찰하였다. 성능 저하에 대한 개선 기법으로 인위적으로 제작한 가상 공간 임펄스 응답을 이용한 데이터 증가 방법과 공간 임펄스 응답을 네트워크에 컨디셔닝하는 기법을 제안하였다. 실험을 통해 제안한 데이터 증가 방법이 잔향 환경에서의 성능을 개선함을 검증하며, 특히 데이터 증가 방법과 컨디셔닝 기법을 함께 사용했을 때 추가적으로 성능이 향상됨을 보인다. 또한 제안한 컨디셔닝 기법이 정확한 공간 임펄스 응답 오디오를 모를 때라도 대략적 잔향 시간 정보를 통해 성능을 향상시킬 수 있음을 보인다.In this paper, we propose techniques to enhance performance of sound event classification in reverberant environment. Sound event classification is actively applied to various application fields such as anomaly detection system, and it is important to maintain robust performance in real-world environments. In real-world environments, noise and reverberation are the main factors that degrade the performance of sound event classification. However, the research on sound event classification in noisy and especially reverberant environments is poor. Therefore, in this paper, we observe the degradation phenomenon of sound event classification in reverberant environments and propose performance enhancement techniques for this phenomenon. To do this, we build a test set that models the reverberant environments and observe that sound event classification performance of the test set is degraded. In order to improve the performance, we propose a data augmentation method using an artificially synthesized room impulse response and a method of conditioning the room impulse response to the network. Experimental results show that the proposed data augmentation method improves performance in reverberant environments. It also demonstrates additional performance improvements when using with the proposed conditioning method together. Finally, we show that the proposed method improves the performance by using approximate reverberation time information even when accurate room impulse response audio is not known.제1장 서론 6 1.1 연구 배경 6 1.2 연구 목표 9 제2장 배경 이론 및 관련 연구 10 2.1 배경 이론 10 2.1.1 사운드 이벤트 분류 10 2.1.2 딥러닝 연구 12 2.1.3 잔향 및 공간 임펄스 응답 16 2.2 관련 연구 19 2.2.1 사운드 이벤트 분류 연구 19 2.2.2 제안 기법 관련 연구 25 제3장 제안 기법 28 3.1 가상 공간 임펄스 응답을 이용한 데이터 증가방법 28 3.2 공간 임펄스 응답 컨디셔닝 네트워크 31 제4장 실험 34 4.1 실험 준비 34 4.1.1 데이터셋 34 4.1.2 테스트셋 제작 방법 35 4.1.3 실험 상세 설정 38 4.2 실험 결과 및 토론 42 4.2.1 잔향 환경에서의 사운드 이벤트 분류 성능 저하 42 4.2.2 디컨볼루션 적용 시 성능 및 한계점 47 4.2.3 데이터 증가 방법을 이용한 성능 향상 49 4.2.4 컨디셔닝 네트워크를 이용한 성능 향상 50 제5장 결론 58 5.1 연구 의의 58 5.2 한계점 60 5.3 향후 연구 61 ABSTRACT 68 감사의글 70Maste

SNU Open Repository and Archive

Sound classification using evolving ensemble models and Particle Swarm Optimization

Author: Jiang Ming
Lim Chee Peng
Yu Yonghong
Zhang Li
Publication venue: 'Elsevier BV'
Publication date: 08/01/2022
Field of study

Royal Holloway - Pure

Machine learning and audio processing : a thesis presented in partial fulfilment of the requirements for the degree of Doctor of Philosophy in Computer Science at Massey University, Albany, Auckland, New Zealand

Author: Ma Junbo
Publication venue: 'Massey University'
Publication date: 01/01/2019
Field of study

In this thesis, we addressed two important theoretical issues in deep neural networks and clustering, respectively. Also, we developed a new approach for polyphonic sound event detection, which is one of the most important applications in the audio processing area. The developed three novel approaches are: (i) The Large Margin Recurrent Neural Network (LMRNN), which improves the discriminative ability of original Recurrent Neural Networks by introducing a large margin term into the widely used cross-entropy loss function. The developed large margin term utilises the large margin discriminative principle as a heuristic term to navigate the convergence process during training, which fully exploits the information from data labels by considering both target category and competing categories. (ii) The Robust Multi-View Continuous Subspace Clustering (RMVCSC) approach, which performs clustering on a common view-invariant subspace learned from all views. The clustering result and the common representation subspace are simultaneously optimised by a single continuous objective function. In the objective function, a robust estimator is used to automatically clip specious inter-cluster connections while maintaining convincing intra-cluster correspondences. Thus, the developed RMVCSC can untangle heavily mixed clusters without pre-setting the number of clusters. (iii) The novel polyphonic sound event detection approach based on Relational Recurrent Neural Network (RRNN), which utilises the relational reasoning ability of RRNNs to untangle the overlapping sound events across audio recordings. Different from previous works, which mixed and packed all historical information into a single common hidden memory vector, the developed approach allows historical information to interact with each other across an audio recording, which is effective and efficient in untangling the overlapping sound events. All three approaches are tested on widely used datasets and compared with recently published works. The experimental results have demonstrated the effectiveness and efficiency of the developed approaches

Massey Research Online