5 research outputs found
Performance Enhancement Techniques for Sound Event Classification in Reverberant Environment
νμλ
Όλ¬Έ(μμ¬)--μμΈλνκ΅ λνμ :μ΅ν©κ³ΌνκΈ°μ λνμ μ΅ν©κ³ΌνλΆ(λμ§νΈμ 보μ΅ν©μ 곡),2019. 8. μ΄κ΅κ΅¬.λ³Έ μ°κ΅¬μμλ μν₯ νκ²½μμμ μ¬μ΄λ μ΄λ²€νΈ λΆλ₯μ μ±λ₯μ κ°μ νλ κΈ°λ²μ μ μνλ€. μ¬μ΄λ μ΄λ²€νΈ λΆλ₯λ κ΅ν΅ μν©, λ°©λ² μν© κ°μ§ μμ€ν
λ± λ€μν μμ©λΆμΌμ νλ°νκ² μ μ©λκ³ μκ³ μμ©λΆμΌμ νΉμ±μ μ€μ νκ²½μ μ‘μκ³Ό μν₯μ κ°μΈν μ±λ₯μ κ°λ κ²μ΄ μ€μν λ¬Έμ μ΄λ€. νμ§λ§ μ΄λ° μ‘μκ³Ό μν₯ νκ²½μμμ μ¬μ΄λ μ΄λ²€νΈ λΆλ₯ μ±λ₯ μ νμ λν μ°κ΅¬λ μ μ‘°νλ©° νΉν μν₯ νκ²½μμμ μ¬μ΄λ μ΄λ²€νΈ λΆλ₯ μ°κ΅¬λ μ 무ν μ€μ μ΄λ€.
λ°λΌμ λ³Έ μ°κ΅¬μμλ μν₯ νκ²½μμ μ¬μ΄λ μ΄λ²€νΈ λΆλ₯ μ±λ₯μ΄ μ νλλ κ²μ κ΄μ°°νκ³ μ΄λ₯Ό ν΄κ²°νκΈ° μν κ°μ κΈ°λ²μ μ μνλ€. λ¨Όμ , μν₯ νκ²½μ λͺ¨λΈλ§ νκΈ° μν΄ μλ³Έ λ°μ΄ν°μ
μ μν₯μ΄ μ‘΄μ¬νλ μ€μ νκ²½μμ μ¬λ
Ήμν λ
Ήμ ν
μ€νΈμ
κ³Ό κ³΅κ° μνμ€ μλ΅ λ°μ΄ν°μ
μ μ΄μ©νμ¬ ν©μ±ν ν©μ± ν
μ€νΈμ
μ μ μνμκ³ , μ΄λ₯Ό μ΄μ©νμ¬ μν₯ νκ²½μμ μ¬μ΄λ μ΄λ²€νΈ λΆλ₯ μ±λ₯μ΄ μ νλ¨μ κ΄μ°°νμλ€.
μ±λ₯ μ νμ λν κ°μ κΈ°λ²μΌλ‘ μΈμμ μΌλ‘ μ μν κ°μ κ³΅κ° μνμ€ μλ΅μ μ΄μ©ν λ°μ΄ν° μ¦κ° λ°©λ²κ³Ό κ³΅κ° μνμ€ μλ΅μ λ€νΈμν¬μ 컨λμ
λνλ κΈ°λ²μ μ μνμλ€. μ€νμ ν΅ν΄ μ μν λ°μ΄ν° μ¦κ° λ°©λ²μ΄ μν₯ νκ²½μμμ μ±λ₯μ κ°μ ν¨μ κ²μ¦νλ©°, νΉν λ°μ΄ν° μ¦κ° λ°©λ²κ³Ό 컨λμ
λ κΈ°λ²μ ν¨κ» μ¬μ©νμ λ μΆκ°μ μΌλ‘ μ±λ₯μ΄ ν₯μλ¨μ 보μΈλ€. λν μ μν 컨λμ
λ κΈ°λ²μ΄ μ νν κ³΅κ° μνμ€ μλ΅ μ€λμ€λ₯Ό λͺ¨λ₯Ό λλΌλ λλ΅μ μν₯ μκ° μ 보λ₯Ό ν΅ν΄ μ±λ₯μ ν₯μμν¬ μ μμμ 보μΈλ€.In this paper, we propose techniques to enhance performance of sound event classification in reverberant environment. Sound event classification is actively applied to various application fields such as anomaly detection system, and it is important to maintain robust performance in real-world environments. In real-world environments, noise and reverberation are the main factors that degrade the performance of sound event classification. However, the research on sound event classification in noisy and especially reverberant environments is poor.
Therefore, in this paper, we observe the degradation phenomenon of sound event classification in reverberant environments and propose performance enhancement techniques for this phenomenon. To do this, we build a test set that models the reverberant environments and observe that sound event classification performance of the test set is degraded.
In order to improve the performance, we propose a data augmentation method using an artificially synthesized room impulse response and a method of conditioning the room impulse response to the network. Experimental results show that the proposed data augmentation method improves performance in reverberant environments. It also demonstrates additional performance improvements when using with the proposed conditioning method together. Finally, we show that the proposed method improves the performance by using approximate reverberation time information even when accurate room impulse response audio is not known.μ 1μ₯ μλ‘ 6
1.1 μ°κ΅¬ λ°°κ²½ 6
1.2 μ°κ΅¬ λͺ©ν 9
μ 2μ₯ λ°°κ²½ μ΄λ‘ λ° κ΄λ ¨ μ°κ΅¬ 10
2.1 λ°°κ²½ μ΄λ‘ 10
2.1.1 μ¬μ΄λ μ΄λ²€νΈ λΆλ₯ 10
2.1.2 λ₯λ¬λ μ°κ΅¬ 12
2.1.3 μν₯ λ° κ³΅κ° μνμ€ μλ΅ 16
2.2 κ΄λ ¨ μ°κ΅¬ 19
2.2.1 μ¬μ΄λ μ΄λ²€νΈ λΆλ₯ μ°κ΅¬ 19
2.2.2 μ μ κΈ°λ² κ΄λ ¨ μ°κ΅¬ 25
μ 3μ₯ μ μ κΈ°λ² 28
3.1 κ°μ κ³΅κ° μνμ€ μλ΅μ μ΄μ©ν λ°μ΄ν° μ¦κ°λ°©λ² 28
3.2 κ³΅κ° μνμ€ μλ΅ μ»¨λμ
λ λ€νΈμν¬ 31
μ 4μ₯ μ€ν 34
4.1 μ€ν μ€λΉ 34
4.1.1 λ°μ΄ν°μ
34
4.1.2 ν
μ€νΈμ
μ μ λ°©λ² 35
4.1.3 μ€ν μμΈ μ€μ 38
4.2 μ€ν κ²°κ³Ό λ° ν λ‘ 42
4.2.1 μν₯ νκ²½μμμ μ¬μ΄λ μ΄λ²€νΈ λΆλ₯ μ±λ₯ μ ν 42
4.2.2 λ컨볼루μ
μ μ© μ μ±λ₯ λ° νκ³μ 47
4.2.3 λ°μ΄ν° μ¦κ° λ°©λ²μ μ΄μ©ν μ±λ₯ ν₯μ 49
4.2.4 컨λμ
λ λ€νΈμν¬λ₯Ό μ΄μ©ν μ±λ₯ ν₯μ 50
μ 5μ₯ κ²°λ‘ 58
5.1 μ°κ΅¬ μμ 58
5.2 νκ³μ 60
5.3 ν₯ν μ°κ΅¬ 61
ABSTRACT 68
κ°μ¬μκΈ 70Maste
Machine learning and audio processing : a thesis presented in partial fulfilment of the requirements for the degree of Doctor of Philosophy in Computer Science at Massey University, Albany, Auckland, New Zealand
In this thesis, we addressed two important theoretical issues in deep neural
networks and clustering, respectively. Also, we developed a new approach for
polyphonic sound event detection, which is one of the most important applications
in the audio processing area.
The developed three novel approaches are:
(i) The Large Margin Recurrent Neural Network (LMRNN), which improves
the discriminative ability of original Recurrent Neural Networks by
introducing a large margin term into the widely used cross-entropy loss
function. The developed large margin term utilises the large margin
discriminative principle as a heuristic term to navigate the convergence
process during training, which fully exploits the information from data
labels by considering both target category and competing categories.
(ii) The Robust Multi-View Continuous Subspace Clustering (RMVCSC)
approach, which performs clustering on a common view-invariant
subspace learned from all views. The clustering result and the common
representation subspace are simultaneously optimised by a single
continuous objective function. In the objective function, a robust estimator
is used to automatically clip specious inter-cluster connections while
maintaining convincing intra-cluster correspondences. Thus, the developed
RMVCSC can untangle heavily mixed clusters without pre-setting the
number of clusters.
(iii) The novel polyphonic sound event detection approach based on Relational
Recurrent Neural Network (RRNN), which utilises the relational reasoning
ability of RRNNs to untangle the overlapping sound events across audio
recordings. Different from previous works, which mixed and packed all
historical information into a single common hidden memory vector, the
developed approach allows historical information to interact with each
other across an audio recording, which is effective and efficient in
untangling the overlapping sound events.
All three approaches are tested on widely used datasets and compared with
recently published works. The experimental results have demonstrated the
effectiveness and efficiency of the developed approaches