5 research outputs found

    Performance Enhancement Techniques for Sound Event Classification in Reverberant Environment

    Get PDF
    ν•™μœ„λ…Όλ¬Έ(석사)--μ„œμšΈλŒ€ν•™κ΅ λŒ€ν•™μ› :μœ΅ν•©κ³Όν•™κΈ°μˆ λŒ€ν•™μ› μœ΅ν•©κ³Όν•™λΆ€(λ””μ§€ν„Έμ •λ³΄μœ΅ν•©μ „κ³΅),2019. 8. 이ꡐꡬ.λ³Έ μ—°κ΅¬μ—μ„œλŠ” μž”ν–₯ ν™˜κ²½μ—μ„œμ˜ μ‚¬μš΄λ“œ 이벀트 λΆ„λ₯˜μ‹œ μ„±λŠ₯을 κ°œμ„ ν•˜λŠ” 기법을 μ œμ•ˆν•œλ‹€. μ‚¬μš΄λ“œ 이벀트 λΆ„λ₯˜λŠ” ꡐ톡 상황, λ°©λ²” 상황 감지 μ‹œμŠ€ν…œ λ“± λ‹€μ–‘ν•œ μ‘μš©λΆ„μ•Όμ— ν™œλ°œν•˜κ²Œ 적용되고 있고 μ‘μš©λΆ„μ•Όμ˜ νŠΉμ„±μƒ μ‹€μ œ ν™˜κ²½μ˜ 작음과 μž”ν–₯에 κ°•μΈν•œ μ„±λŠ₯을 κ°–λŠ” 것이 μ€‘μš”ν•œ λ¬Έμ œμ΄λ‹€. ν•˜μ§€λ§Œ 이런 작음과 μž”ν–₯ ν™˜κ²½μ—μ„œμ˜ μ‚¬μš΄λ“œ 이벀트 λΆ„λ₯˜ μ„±λŠ₯ μ €ν•˜μ— λŒ€ν•œ μ—°κ΅¬λŠ” μ €μ‘°ν•˜λ©° 특히 μž”ν–₯ ν™˜κ²½μ—μ„œμ˜ μ‚¬μš΄λ“œ 이벀트 λΆ„λ₯˜ μ—°κ΅¬λŠ” μ „λ¬΄ν•œ 싀정이닀. λ”°λΌμ„œ λ³Έ μ—°κ΅¬μ—μ„œλŠ” μž”ν–₯ ν™˜κ²½μ—μ„œ μ‚¬μš΄λ“œ 이벀트 λΆ„λ₯˜ μ„±λŠ₯이 μ €ν•˜λ˜λŠ” 것을 κ΄€μ°°ν•˜κ³  이λ₯Ό ν•΄κ²°ν•˜κΈ° μœ„ν•œ κ°œμ„  기법을 μ œμ•ˆν•œλ‹€. λ¨Όμ €, μž”ν–₯ ν™˜κ²½μ„ λͺ¨λΈλ§ ν•˜κΈ° μœ„ν•΄ 원본 데이터셋을 μž”ν–₯이 μ‘΄μž¬ν•˜λŠ” μ‹€μ œ ν™˜κ²½μ—μ„œ μž¬λ…ΉμŒν•œ λ…ΉμŒ ν…ŒμŠ€νŠΈμ…‹κ³Ό 곡간 μž„νŽ„μŠ€ 응닡 데이터셋을 μ΄μš©ν•˜μ—¬ ν•©μ„±ν•œ ν•©μ„± ν…ŒμŠ€νŠΈμ…‹μ„ μ œμž‘ν•˜μ˜€κ³ , 이λ₯Ό μ΄μš©ν•˜μ—¬ μž”ν–₯ ν™˜κ²½μ—μ„œ μ‚¬μš΄λ“œ 이벀트 λΆ„λ₯˜ μ„±λŠ₯이 μ €ν•˜λ¨μ„ κ΄€μ°°ν•˜μ˜€λ‹€. μ„±λŠ₯ μ €ν•˜μ— λŒ€ν•œ κ°œμ„  κΈ°λ²•μœΌλ‘œ μΈμœ„μ μœΌλ‘œ μ œμž‘ν•œ 가상 곡간 μž„νŽ„μŠ€ 응닡을 μ΄μš©ν•œ 데이터 증가 방법과 곡간 μž„νŽ„μŠ€ 응닡을 λ„€νŠΈμ›Œν¬μ— μ»¨λ””μ…”λ‹ν•˜λŠ” 기법을 μ œμ•ˆν•˜μ˜€λ‹€. μ‹€ν—˜μ„ 톡해 μ œμ•ˆν•œ 데이터 증가 방법이 μž”ν–₯ ν™˜κ²½μ—μ„œμ˜ μ„±λŠ₯을 κ°œμ„ ν•¨μ„ κ²€μ¦ν•˜λ©°, 특히 데이터 증가 방법과 컨디셔닝 기법을 ν•¨κ»˜ μ‚¬μš©ν–ˆμ„ λ•Œ μΆ”κ°€μ μœΌλ‘œ μ„±λŠ₯이 ν–₯상됨을 보인닀. λ˜ν•œ μ œμ•ˆν•œ 컨디셔닝 기법이 μ •ν™•ν•œ 곡간 μž„νŽ„μŠ€ 응닡 μ˜€λ””μ˜€λ₯Ό λͺ¨λ₯Ό λ•ŒλΌλ„ λŒ€λž΅μ  μž”ν–₯ μ‹œκ°„ 정보λ₯Ό 톡해 μ„±λŠ₯을 ν–₯μƒμ‹œν‚¬ 수 μžˆμŒμ„ 보인닀.In this paper, we propose techniques to enhance performance of sound event classification in reverberant environment. Sound event classification is actively applied to various application fields such as anomaly detection system, and it is important to maintain robust performance in real-world environments. In real-world environments, noise and reverberation are the main factors that degrade the performance of sound event classification. However, the research on sound event classification in noisy and especially reverberant environments is poor. Therefore, in this paper, we observe the degradation phenomenon of sound event classification in reverberant environments and propose performance enhancement techniques for this phenomenon. To do this, we build a test set that models the reverberant environments and observe that sound event classification performance of the test set is degraded. In order to improve the performance, we propose a data augmentation method using an artificially synthesized room impulse response and a method of conditioning the room impulse response to the network. Experimental results show that the proposed data augmentation method improves performance in reverberant environments. It also demonstrates additional performance improvements when using with the proposed conditioning method together. Finally, we show that the proposed method improves the performance by using approximate reverberation time information even when accurate room impulse response audio is not known.제1μž₯ μ„œλ‘  6 1.1 연ꡬ λ°°κ²½ 6 1.2 연ꡬ λͺ©ν‘œ 9 제2μž₯ λ°°κ²½ 이둠 및 κ΄€λ ¨ 연ꡬ 10 2.1 λ°°κ²½ 이둠 10 2.1.1 μ‚¬μš΄λ“œ 이벀트 λΆ„λ₯˜ 10 2.1.2 λ”₯λŸ¬λ‹ 연ꡬ 12 2.1.3 μž”ν–₯ 및 곡간 μž„νŽ„μŠ€ 응닡 16 2.2 κ΄€λ ¨ 연ꡬ 19 2.2.1 μ‚¬μš΄λ“œ 이벀트 λΆ„λ₯˜ 연ꡬ 19 2.2.2 μ œμ•ˆ 기법 κ΄€λ ¨ 연ꡬ 25 제3μž₯ μ œμ•ˆ 기법 28 3.1 가상 곡간 μž„νŽ„μŠ€ 응닡을 μ΄μš©ν•œ 데이터 증가방법 28 3.2 곡간 μž„νŽ„μŠ€ 응닡 컨디셔닝 λ„€νŠΈμ›Œν¬ 31 제4μž₯ μ‹€ν—˜ 34 4.1 μ‹€ν—˜ μ€€λΉ„ 34 4.1.1 데이터셋 34 4.1.2 ν…ŒμŠ€νŠΈμ…‹ μ œμž‘ 방법 35 4.1.3 μ‹€ν—˜ 상세 μ„€μ • 38 4.2 μ‹€ν—˜ κ²°κ³Ό 및 ν† λ‘  42 4.2.1 μž”ν–₯ ν™˜κ²½μ—μ„œμ˜ μ‚¬μš΄λ“œ 이벀트 λΆ„λ₯˜ μ„±λŠ₯ μ €ν•˜ 42 4.2.2 λ””μ»¨λ³Όλ£¨μ…˜ 적용 μ‹œ μ„±λŠ₯ 및 ν•œκ³„μ  47 4.2.3 데이터 증가 방법을 μ΄μš©ν•œ μ„±λŠ₯ ν–₯상 49 4.2.4 컨디셔닝 λ„€νŠΈμ›Œν¬λ₯Ό μ΄μš©ν•œ μ„±λŠ₯ ν–₯상 50 제5μž₯ κ²°λ‘  58 5.1 연ꡬ 의의 58 5.2 ν•œκ³„μ  60 5.3 ν–₯ν›„ 연ꡬ 61 ABSTRACT 68 κ°μ‚¬μ˜κΈ€ 70Maste

    Machine learning and audio processing : a thesis presented in partial fulfilment of the requirements for the degree of Doctor of Philosophy in Computer Science at Massey University, Albany, Auckland, New Zealand

    Get PDF
    In this thesis, we addressed two important theoretical issues in deep neural networks and clustering, respectively. Also, we developed a new approach for polyphonic sound event detection, which is one of the most important applications in the audio processing area. The developed three novel approaches are: (i) The Large Margin Recurrent Neural Network (LMRNN), which improves the discriminative ability of original Recurrent Neural Networks by introducing a large margin term into the widely used cross-entropy loss function. The developed large margin term utilises the large margin discriminative principle as a heuristic term to navigate the convergence process during training, which fully exploits the information from data labels by considering both target category and competing categories. (ii) The Robust Multi-View Continuous Subspace Clustering (RMVCSC) approach, which performs clustering on a common view-invariant subspace learned from all views. The clustering result and the common representation subspace are simultaneously optimised by a single continuous objective function. In the objective function, a robust estimator is used to automatically clip specious inter-cluster connections while maintaining convincing intra-cluster correspondences. Thus, the developed RMVCSC can untangle heavily mixed clusters without pre-setting the number of clusters. (iii) The novel polyphonic sound event detection approach based on Relational Recurrent Neural Network (RRNN), which utilises the relational reasoning ability of RRNNs to untangle the overlapping sound events across audio recordings. Different from previous works, which mixed and packed all historical information into a single common hidden memory vector, the developed approach allows historical information to interact with each other across an audio recording, which is effective and efficient in untangling the overlapping sound events. All three approaches are tested on widely used datasets and compared with recently published works. The experimental results have demonstrated the effectiveness and efficiency of the developed approaches
    corecore