58 research outputs found

    An Online Solution for Localisation, Tracking and Separation of Moving Speech Sources

    Get PDF
    The problem of separating a time varying number of speech sources in a room is difficult to solve. The challenge lies in estimating the number and the location of these speech sources. Furthermore, the tracked speech sources need to be separated. This thesis proposes a solution which utilises the Random Finite Set approach to estimate the number and location of these speech sources and subsequently separate the speech source mixture via time frequency masking

    Source Separation and DOA Estimation for Underdetermined Auditory Scene

    Get PDF

    Model-based Sparse Component Analysis for Reverberant Speech Localization

    Get PDF
    In this paper, the problem of multiple speaker localization via speech separation based on model-based sparse recovery is studies. We compare and contrast computational sparse optimization methods incorporating harmonicity and block structures as well as autoregressive dependencies underlying spectrographic representation of speech signals. The results demonstrate the effectiveness of block sparse Bayesian learning framework incorporating autoregressive correlations to achieve a highly accurate localization performance. Furthermore, significant improvement is obtained using ad-hoc microphones for data acquisition set-up compared to the compact microphone array

    A Time-frequency Masking Based Random Finite Set Particle Filtering Method for Multiple Acoustic Source Detection and Tracking

    Get PDF

    Acoustic Speaker Localization with Strong Reverberation and Adaptive Feature Filtering with a Bayes RFS Framework

    Get PDF
    The thesis investigates the challenges of speaker localization in presence of strong reverberation, multi-speaker tracking, and multi-feature multi-speaker state filtering, using sound recordings from microphones. Novel reverberation-robust speaker localization algorithms are derived from the signal and room acoustics models. A multi-speaker tracking filter and a multi-feature multi-speaker state filter are developed based upon the generalized labeled multi-Bernoulli random finite set framework. Experiments and comparative studies have verified and demonstrated the benefits of the proposed methods

    ์‹ค๋‚ด ๋‹ค์ค‘ ์Œ์› ํ™˜๊ฒฝ์— ์ ์šฉ ๊ฐ€๋Šฅํ•œ ์Œํ–ฅ ์‹ ํ˜ธ ์ฒ˜๋ฆฌ ๊ธฐ๋ฒ•๊ณผ ๊ทธ ์‘์šฉ

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ(๋ฐ•์‚ฌ) -- ์„œ์šธ๋Œ€ํ•™๊ต๋Œ€ํ•™์› : ๊ณต๊ณผ๋Œ€ํ•™ ์ „๊ธฐยท์ •๋ณด๊ณตํ•™๋ถ€, 2022. 8. ๊น€์„ฑ์ฒ .์ตœ๊ทผ ์Œํ–ฅ ์‹ ํ˜ธ ์ฒ˜๋ฆฌ์— ๋Œ€ํ•œ ์—ฐ๊ตฌ๊ฐ€ ์ฆ๊ฐ€ํ•˜๊ณ  ์žˆ๋‹ค. ์Œํ–ฅ ์‹ ํ˜ธ ์ฒ˜๋ฆฌ๋ฅผ ํ†ตํ•ด ์œ ์˜๋ฏธํ•œ ์ •๋ณด๋ฅผ ์–ป์–ด๋‚ด ์œ ์šฉํ•˜๊ฒŒ ํ™œ์šฉํ•  ์ˆ˜ ์žˆ๊ธฐ ๋•Œ๋ฌธ์ด๋‹ค. ๋”ฐ๋ผ์„œ ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ์‹ค๋‚ด ํ™˜๊ฒฝ์—์„œ ์ทจ๋“ํ•œ ์†Œ๋ฆฌ์— ์ ์šฉ ๊ฐ€๋Šฅํ•œ ์Œํ–ฅ ์‹ ํ˜ธ ์ฒ˜๋ฆฌ ๊ธฐ๋ฒ•์— ๊ด€ํ•œ ๋‚ด์šฉ์„ ๋‹ค๋ฃฌ๋‹ค. ์ฒ˜์Œ์œผ๋กœ๋Š” ์ž”ํ–ฅ์ด ๋†’๊ณ  ์žก์Œ์ด ๋งŽ์€ ์‹ค๋‚ด ํ™˜๊ฒฝ์—์„œ ๋…น์Œํ•œ ์Œ์› ์‹ ํ˜ธ๋กœ๋ถ€ํ„ฐ ์Œ์› ์œ„์น˜๋ฅผ ์ถ”์ •ํ•˜๋Š” ๊ธฐ๋ฒ•์„ ์†Œ๊ฐœํ•œ๋‹ค. ๊ธฐ์กด ์Œ์› ์œ„์น˜ ์ถ”์ • ๊ธฐ๋ฒ•์ธ ์—๋„ˆ์ง€ ๊ธฐ๋ฐ˜ ์œ„์น˜ ์ถ”์ •, ์‹œ๊ฐ„ ์ง€์—ฐ ๊ธฐ๋ฐ˜ ์œ„์น˜ ์ถ”์ • ๋ฐ SRP-PHAT ๊ธฐ๋ฐ˜ ์œ„์น˜์ถ”์ • ๊ธฐ๋ฒ•์˜ ๊ฒฝ์šฐ ์ž”ํ–ฅ์ด ๋†’์•„ ์†Œ๋ฆฌ๊ฐ€ ์šธ๋ฆฌ๋Š” ์‹ค๋‚ด ํ™˜๊ฒฝ์— ์ ์šฉํ•˜๋ฉด ๊ทธ ์ •ํ™•๋„๊ฐ€ ๋–จ์–ด์ง„๋‹ค. ๋ฐ˜๋ฉด ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ์—ฌ๋Ÿฌ๊ฐœ์˜ ๋งˆ์ดํฌ๋กœ ๊ตฌ์„ฑ๋œ ๋งˆ์ดํฌ ์–ด๋ ˆ์ด๋กœ ๋ถ€ํ„ฐ ์ตœ์ ์˜ ์„ฑ๋Šฅ์„ ๋‚ผ ์ˆ˜ ์žˆ๋Š” ๋งˆ์ดํฌ์˜ ์กฐํ•ฉ์„ ์ฐพ์•„๋‚ผ ์ˆ˜ ์žˆ๋Š” ๋น„์šฉ ํ•จ์ˆ˜๋ฅผ ์ƒˆ๋กœ์ด ์ •์˜ํ•œ๋‹ค. ์ด ๋น„์šฉํ•จ์ˆ˜ ๊ฐ’์ด ์ตœ์ €๊ฐ€ ๋˜๋Š” ๋งˆ์ดํฌ ์กฐํ•ฉ์„ ์ฐพ์•„๋‚ด ํ•ด๋‹น ๋งˆ์ดํฌ๋กœ ์Œ์› ์œ„์น˜ ์ถ”์ •์„ ์ง„ํ–‰ํ•œ ๊ฒฐ๊ณผ ๊ธฐ์กด ๊ธฐ๋ฒ• ๋Œ€๋น„ ๊ฑฐ๋ฆฌ ์˜ค์ฐจ๊ฐ€ ์ค„์–ด๋“  ๊ฒƒ์„ ํ™•์ธํ•˜์˜€๋‹ค. ๋‹ค์Œ์œผ๋กœ๋Š” ์†์‹ค์ด ๋ฐœ์ƒํ•œ ๋…น์Œ ์Œ์›์—์„œ ์†์‹ค๋œ ๊ฐ’์„ ๋ณต์›ํ•˜๋Š” ๊ธฐ๋ฒ•์„ ์†Œ๊ฐœํ•œ๋‹ค. ๋ณธ ๊ธฐ๋ฒ•์—์„œ ๋ชฉํ‘œ๋กœ ์‚ผ๋Š” ์Œ์›์€ ์—ฌ๋Ÿฌ ๊ฐœ์˜ ์‚ฌ์ธํŒŒํ˜• ์‹ ํ˜ธ๊ฐ€ ํ•ฉ์ณ์ ธ์„œ ๋“ค์–ด์˜ค๋Š” ์Œ์›์ด๋‹ค. ๋ฌดํ–ฅ์‹ค์—๋Š” ์—ฌ๋Ÿฌ๊ฐœ์˜ ์Œ์›์ด ์กด์žฌํ•˜์ง€๋งŒ ๋งˆ์ดํฌ๋Š” ๋‹จ ํ•œ๊ฐœ๋งŒ ์žˆ๋Š” ์ƒํ™ฉ์„ ๊ฐ€์ •ํ•œ๋‹ค. ์‚ฌ์ธ ํŒŒํ˜•์€ ์˜ค์ผ๋Ÿฌ ๊ณต์‹์— ๊ธฐ๋ฐ˜ํ•ด ์ง€์ˆ˜ ํ•จ์ˆ˜ ๊ผด๋กœ ๋ณ€ํ˜•ํ•  ์ˆ˜ ์žˆ๊ณ , ๋งŒ์•ฝ ์ง€์ˆ˜ํ•จ์ˆ˜ ๊ตฌ์„ฑ ํ•ญ ์ค‘ ์ผ๋ถ€๊ฐ€ ๋“ฑ๋น„์ˆ˜์—ด์„ ๋”ฐ๋ฅด๋Š” ๊ฒฝ์šฐ ๋ณธ ๋…ผ๋ฌธ์—์„œ ์†Œ๊ฐœํ•˜๋Š” ๊ธฐ๋ฒ•์„ ์ด์šฉํ•ด ํ•ด๋‹น ๋“ฑ๋น„์ˆ˜์—ด์˜ ๊ตฌ์„ฑ๊ฐ’์„ ๊ตฌํ•  ์ˆ˜ ์žˆ๋‹ค. ๋ณธ ๋ฌธ์ œ๋ฅผ ํ’€๊ธฐ ์œ„ํ•ด ๋žœ๋ค ํฌํฌ๋ผ๋Š” ๊ฐœ๋…์„ ์ƒˆ๋กœ์ด ๋„์ž…ํ–ˆ๋‹ค. ๋ณธ ๊ธฐ๋ฒ•์„ ์ด์šฉํ•ด ์‹ ํ˜ธ๋ฅผ ๋ณต์›ํ•œ ๊ฒฐ๊ณผ, ์‹ ํ˜ธ ๋ณต์› ์ •ํ™•๋„๋Š” ๊ธฐ์กด์˜ ์••์ถ• ์„ผ์‹ฑ ๊ธฐ๋ฐ˜ ๋ณต์›๊ธฐ๋ฒ• ๋ฐ DNN ๊ธฐ๋ฐ˜ ๋ณต์› ๊ธฐ๋ฒ•๋ณด๋‹ค ๊ทธ ์ •ํ™•๋„๊ฐ€ ๋†’์•˜๋‹ค. ๋งˆ์ง€๋ง‰์œผ๋กœ ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ์ด์ „์— ์†Œ๊ฐœํ•œ SSRF ๊ธฐ๋ฒ•์„ ๊ธฐ๋ฐ˜์œผ๋กœ ํ•ฉ์ณ์ง„ ์‹ ํ˜ธ๋ฅผ ๋ถ„๋ฆฌํ•˜๋Š” ๊ธฐ๋ฒ•์„ ์†Œ๊ฐœํ•œ๋‹ค. ๋ณธ ๊ธฐ๋ฒ•์—์„œ๋Š” ์ด์ „๊ณผ ๊ฐ™์ด ์‚ฌ์ธ ํŒŒํ˜•์˜ ์‹ ํ˜ธ๊ฐ€ ํ•ฉ์ณ์ ธ์„œ ๋“ค์–ด์˜ค๋Š” ์ƒํ™ฉ์„ ๊ฐ€์ •ํ•œ๋‹ค. ๊ฑฐ๊ธฐ์— ๋”ํ•ด ์ด์ „ ๊ธฐ๋ฒ•์—์„œ๋Š” ๋ชจ๋“  ์‚ฌ์ธ ํŒŒํ˜•์ด ๋™์‹œ์— ์žฌ์ƒ๋˜๋Š” ์ƒํ™ฉ์„ ๊ฐ€์ •ํ•œ ๋ฐ˜๋ฉด, ๋ณธ ๊ธฐ๋ฒ•์—์„œ๋Š” ๊ฐ๊ธฐ ๋‹ค๋ฅธ ์Œ์›์ด ๋งˆ์ดํฌ๋กœ ๋ถ€ํ„ฐ ๊ฐ๊ฐ ๋‹ค๋ฅธ ๊ฑฐ๋ฆฌ๋งŒํผ ๋–จ์–ด์ ธ ์žˆ์–ด์„œ ๋ชจ๋‘ ๋‹ค๋ฅธ ์‹œ๊ฐ„ ์ง€์—ฐ์„ ๊ฐ€์ง€๊ณ  ๋งˆ์ดํฌ๋กœ ๋„๋‹ฌํ•˜๋Š” ์ƒํ™ฉ์„ ๊ฐ€์ •ํ•œ๋‹ค. ์ด๋ ‡๊ฒŒ ์„œ๋กœ ๋‹ค๋ฅธ ์‹œ๊ฐ„์ง€์—ฐ์„ ๊ฐ–๊ณ  ํ•˜๋‚˜์˜ ๋งˆ์ดํฌ๋กœ ๋„๋‹ฌํ•˜๋Š” ์‚ฌ์ธํŒŒํ˜•์˜ ์‹ ํ˜ธ๊ฐ€ ํ•ฉ์ณ์ง„ ์ƒํ™ฉ์—์„œ ๊ฐ๊ฐ์˜ ์‹ ํ˜ธ๋ฅผ ๋ถ„๋ฆฌํ•œ๋‹ค. ๋ณธ ๋…ผ๋ฌธ์—์„œ ์†Œ๊ฐœํ•˜๋Š” ๊ธฐ๋ฒ•์€ ํฌ๊ฒŒ ์Œ์› ๊ฐฏ์ˆ˜ ์ถ”์ •, ์‹œ๊ฐ„ ์ง€์—ฐ ์ถ”์ • ๋ฐ ์‹ ํ˜ธ ๋ถ„๋ฆฌ์˜ ์„ธ ๊ฐœ ๋‹จ๊ณ„๋กœ ๊ตฌ์„ฑ๋œ๋‹ค. ๊ธฐ์กด์˜ ์Œํ–ฅ ์‹ ํ˜ธ ๋ถ„๋ฆฌ ๊ธฐ๋ฒ•๋“ค์ด ์Œ์›์˜ ๊ฐฏ์ˆ˜์— ๋Œ€ํ•œ ์ •๋ณด๋ฅผ ๋ฏธ๋ฆฌ ์•Œ์•„์•ผ ํ•œ๋‹ค๊ฑฐ๋‚˜, ์‹œ๊ฐ„์ง€์—ฐ์ด ์—†๋Š” ์‹ ํ˜ธ์— ๋Œ€ํ•ด์„œ๋งŒ ์ ์šฉ์ด ๊ฐ€๋Šฅํ–ˆ๋‹ค๋ฉด, ๋ณธ ๊ธฐ๋ฒ•์€ ์‚ฌ์ „์— ์Œ์› ๊ฐฏ์ˆ˜์— ๋Œ€ํ•œ ์ •๋ณด๊ฐ€ ์—†์–ด๋„ ์ ์šฉ ๊ฐ€๋Šฅํ•˜๋‹ค๋Š” ์žฅ์ ์ด ์žˆ๋‹ค. ํ•ด๋‹น ๊ธฐ๋ฒ•์€ SSRF ๊ธฐ๋ฒ•์„ ๊ธฐ๋ฐ˜์œผ๋กœ ํ•˜๋Š”๋ฐ, SSRF ๋ฌธ์ œ๋ฅผ ํ‘ธ๋Š” ๊ณผ์ •์—์„œ ๊ตฌํ•ด์ง€๋Š” ๋ฐฉ์ •์‹์˜ ๊ณ„์ˆ˜ ๊ฐ’์ด ๋ณ€ํ•˜๋Š” ์ง€์ ์„ ์‹œ๊ฐ„ ์ง€์—ฐ์œผ๋กœ ์ถ”์ •ํ•œ๋‹ค. ๊ทธ๋ฆฌ๊ณ  ์‹œ๊ฐ„ ์ง€์—ฐ ๊ฐ’์˜ ๋ณ€ํ™”๊ฐ€ ๋ช‡ ๋ฒˆ ๋ฐœ์ƒํ•˜๋Š”๊ฐ€์— ๋”ฐ๋ผ ์Œ์›์˜ ๊ฐฏ์ˆ˜๋ฅผ ์ถ”์ •ํ•œ๋‹ค. ๋งˆ์ง€๋ง‰์œผ๋กœ ๋ชจ๋“  ์‹ ํ˜ธ๊ฐ€ ํ•ฉ์ณ์ง„ ์ตœ์ข… ๊ตฌ๊ฐ„์—์„œ SSRF ๋ฌธ์ œ๋ฅผ ํ’€์–ด ๊ฐœ๋ณ„ ์‹ ํ˜ธ๋ฅผ ๊ตฌ์„ฑํ•˜๋Š” ๊ฐ’์„ ๊ตฌํ•ด๋‚ด ์‹ ํ˜ธ ๋ถ„๋ฆฌ๋ฅผ ์™„๋ฃŒํ•œ๋‹ค. ๋ณธ ๊ธฐ๋ฒ•์€ ์—ฌ๋Ÿฌ ๊ฐ€์ •์ด ํ•„์š”ํ•œ ๊ธฐ์กด์˜ ICA ๊ธฐ๋ฐ˜ ์Œํ–ฅ ์‹ ํ˜ธ ๋ถ„๋ฆฌ ๋ฐ YG ์Œํ–ฅ ์‹ ํ˜ธ ๋ถ„๋ฆฌ์— ๋น„ํ•ด ๋” ์ •ํ™•ํ•œ ์‹ ํ˜ธ๋ถ„๋ฆฌ ๊ฒฐ๊ณผ๋ฅผ ๋‚ด๋Š” ๊ฒƒ์„ ํ™•์ธํ•˜์˜€๋‹ค.Recently, research on acoustic signal processing is increasing. This is because meaningful information can be obtained and utilized usefully from acoustic signal processing. Therefore, this paper deals with the acoustic signal processing techniques for sound recorded in the indoor environment. First, we introduce a method for estimating the location of a sound source under indoor environment where there are high reverberation and lots of noise. In the case of existing methods such as interaural level difference (ILD) based localization, time difference of arrival (TDoA) based localization, and steered response power phase transformation (SRP-PHAT) based localization, the accuracy is lowered when applied under recordings from indoor environment with high reverberation. However in this paper, we define a new cost function that can find an optimal combination of microphone pair which results in highest performance. The microphone pair with the lowest value of cost function was chosen as an optimal pair, and the source location was estimated with the optimal microphone pair. It was confirmed that the distance error was reduced compared to existing methods. Next, a technique for recovering the lost sample value from the recorded signal called sketching and stacking with random fork (SSRF) is introduced. In this technique, the target sound source is a superposition of several sinusoidal signals. It is assumed that there are multiple sound sources in the anechoic chamber, but there is only one microphone. It is trivial that a sinusiodal wave can be transformed into an exponential function based on Euler's formula. If some of the terms of the exponential function follow a geometric sequence, those values can be obtained using SSRF. To solve this problem, the concept of a random fork is newly introduced. Comparing the recovery error based on SSRF with existing methods such as compressive sensing based technique and deep neural network (DNN) based technique, the accuracy of SSRF based signal recovery was higher. Finally, this paper introduces a blind source separation (BSS) technique for based on the previously introduced SSRF technique. In this technique, as before, it is assumed that the sinusoidal waves are superposed. In addition, while the previous technique assumed a situation where all sinusoidal waves were emitted simultaneously, this technique assumed a situation where different sound sources were separated by different distances from the microphone and arrived at the microphone with different time delays. Under these assumptions, a new BSS method for separating single signals from the mixture based on SSRF is introduced. The SSRF BSS is mainly composed of three steps: estimation of the number of sound sources, estimation of time delay, and signal separation. While the existing BSS methods require information on the source number to be known a priori, SSRF BSS does not require source number. Whereas existing BSS methods can only be applied to signals without time delay, SSRF BSS method has the advantage in that it can be applied to the mixture of signals with different time delays. It was confirmed that SSRF BSS produces more accurate separation results compared to the existing independent component analysis (ICA) BSS and Yu Gang (YG) BSS.1 INTRODUCTION 2 IMPROVING ACOUSTIC LOCALIZATION PERFORMANCE BY FINDING OPTIMAL PAIR OF MICROPHONES BASED ON COST FUNCTION 5 2.1 Motivation 5 2.2 Conventional Acoustic Localization Methods 8 2.2.1 Interaural Level Difference 8 2.2.2 Time Difference of Arrival 12 2.2.3 Steered Response Power Phase Transformation 14 2.3 System Model 17 2.3.1 Experimental Scenarios 17 2.3.2 Definition of Cost Function 18 2.4 Results and Discussion 20 2.5 Summary 22 3 ACOUSTIC SIGNAL RECOVERY BASED ON SKETCHING AND STACKING WITH RANDOM FORK 24 3.1 Motivation 24 3.2 SSRF Signal Model 26 3.2.1 Source Signal Model 26 3.2.2 Sampled Signal Model 26 3.2.3 Corrupted Signal Model 27 3.3 SSRF Problem Statement 28 3.4 SSRF Methodology 28 3.4.1 Geometric Sequential Representation 29 3.4.2 Definition of Random Fork 30 3.4.3 Informative Matrix 31 3.4.4 Data Augmentation 32 3.4.5 Solution of SSRF Problem 33 3.4.6 Reconstruction of Corrupted Samples 37 3.5 Performance Analysis 37 3.5.1 Simulation Set-up 37 3.5.2 Reconstruction Error According to Bernoulli Parameter and Number of Signals 38 3.5.3 Detailed Comparison between SSRF and DNN 40 3.5.4 SSRF Result for Signal with Additive White Gaussian Noise 42 3.6 Summary 43 4 SINGLE CHANNEL ACOUSTIC SOURCE NUMBER ESTIMATION AND BLIND SOURCE SEPARATION BASED ON SKETCHING AND STACKING WITH RANDOM FORK 44 4.1 Motivation 44 4.2 SSRF based BSS System Model 48 4.2.1 Simulation Scenarios 48 4.3 SSRF based BSS Methodology 52 4.3.1 Source Number and ToA Estimation based on SSRF 52 4.3.2 Signal Separation 55 4.4 Results and Discussion 57 4.4.1 Source Number and ToA Estimation Results 57 4.4.2 Separation of the Signal 59 4.5 Summary 61 5 CONCLUSION 64 Abstract (In Korean) 75๋ฐ•
    • โ€ฆ
    corecore