500 research outputs found

    Reverberation reduction in a room for multiple positions

    Get PDF
    Reverberation in a room occurs when the direct path sound from a sound source undergoes multiple reflections from the walls of the room before reaching the listener. An impulse response of the room can be measured called the room impulse response (RIR) which captures the effects of the room. This can be represented digitally on a computer. A filter is designed to cancel the effects of the room using the information in the room impulse response. This filter is called an equalization filter and is usually placed between the source signal and loudspeaker to perform the equalization. The RIR changes for varying source and listener locations, hence an equalization filter designed for one RIR will not perform equalization for multiple positions. This thesis explores methods to perform equalization for multiple positions. One of the simplest methods is spatial averaging equalization, which was used to perform the equalization for multiple positions. Equalizing RIR is only concerned about trying to flatten the frequency spectrum and stabilizing the inverse RIR by looking at its minimum-phase component. Other methods are explored which consider the masking effects of the human auditory system which relates to the perception of sound by the human ear. One such method is impulse response shortening/reshaping which emphasizes the direct path component in the RIR relative to the rest of the components using p-norm and infinity-norm optimization which is an iterative algorithm. This concept is extended for performing reshaping on RIR for multiple positions using the idea in spatial averaging equalization by using RIR\u27s measure for different positions --Abstract, page iii

    Blind MultiChannel Identification and Equalization for Dereverberation and Noise Reduction based on Convolutive Transfer Function

    Get PDF
    This paper addresses the problems of blind channel identification and multichannel equalization for speech dereverberation and noise reduction. The time-domain cross-relation method is not suitable for blind room impulse response identification, due to the near-common zeros of the long impulse responses. We extend the cross-relation method to the short-time Fourier transform (STFT) domain, in which the time-domain impulse responses are approximately represented by the convolutive transfer functions (CTFs) with much less coefficients. The CTFs suffer from the common zeros caused by the oversampled STFT. We propose to identify CTFs based on the STFT with the oversampled signals and the critical sampled CTFs, which is a good compromise between the frequency aliasing of the signals and the common zeros problem of CTFs. In addition, a normalization of the CTFs is proposed to remove the gain ambiguity across sub-bands. In the STFT domain, the identified CTFs is used for multichannel equalization, in which the sparsity of speech signals is exploited. We propose to perform inverse filtering by minimizing the 1\ell_1-norm of the source signal with the relaxed 2\ell_2-norm fitting error between the micophone signals and the convolution of the estimated source signal and the CTFs used as a constraint. This method is advantageous in that the noise can be reduced by relaxing the 2\ell_2-norm to a tolerance corresponding to the noise power, and the tolerance can be automatically set. The experiments confirm the efficiency of the proposed method even under conditions with high reverberation levels and intense noise.Comment: 13 pages, 5 figures, 5 table

    Multichannel Speech Separation and Enhancement Using the Convolutive Transfer Function

    Get PDF
    This paper addresses the problem of speech separation and enhancement from multichannel convolutive and noisy mixtures, \emph{assuming known mixing filters}. We propose to perform the speech separation and enhancement task in the short-time Fourier transform domain, using the convolutive transfer function (CTF) approximation. Compared to time-domain filters, CTF has much less taps, consequently it has less near-common zeros among channels and less computational complexity. The work proposes three speech-source recovery methods, namely: i) the multichannel inverse filtering method, i.e. the multiple input/output inverse theorem (MINT), is exploited in the CTF domain, and for the multi-source case, ii) a beamforming-like multichannel inverse filtering method applying single source MINT and using power minimization, which is suitable whenever the source CTFs are not all known, and iii) a constrained Lasso method, where the sources are recovered by minimizing the 1\ell_1-norm to impose their spectral sparsity, with the constraint that the 2\ell_2-norm fitting cost, between the microphone signals and the mixing model involving the unknown source signals, is less than a tolerance. The noise can be reduced by setting a tolerance onto the noise power. Experiments under various acoustic conditions are carried out to evaluate the three proposed methods. The comparison between them as well as with the baseline methods is presented.Comment: Submitted to IEEE/ACM Transactions on Audio, Speech and Language Processin

    Application of channel shortening to acoustic channel equalization in the presence of noise and estimation error

    Full text link

    Sparsity Based Formulations For Dereverberation

    Get PDF
    Tez (Yüksek Lisans) -- İstanbul Teknik Üniversitesi, Fen Bilimleri Enstitüsü, 2016Thesis (M.Sc.) -- İstanbul Technical University, Institute of Science and Technology, 2016Konser, konferans, toplantı gibi ortamlarda kaydedilen akustik işaretler, kaydın alındığı ortam nedeni ile yankıya ve gürültüye maruz kalır. Kaynak işaretinin elde edilen gözlemlerden kestirimi yankı giderme problemi olarak isimlendirilir. Bu kayıtlarda göze çarpan yankı etkileri bir süzgeç olarak zaman tanım bölgesinde modellenebilir. Yankı etkilerini modelleyen bu süzgeç oda darbe cevabı olarak isimlendirilir. Oda darbe cevabının bilindiği durumda problem gözü kapalı olmayan yankı giderme problemine dönüşür. Tez boyunca oda darbe cevabının bilindiği durumlar dikkate alınmıştır. Gözlemlenebilir ki, oda darbe cevabı kaynak ve gözlem noktalarına çok bağımlıdır. Bu nedenle oda darbe cevabının bütün uzaydaki noktalar için kestirimi çok zordur. Bu durumda oda darbe cevapları tezdeki deneylerde sentetik olarak uygulanmış veya gözlem ortamında kayıt alındığı sırada gözlemden elde edilmişlerdir. Bölüm 5, bu duruma farklı bir açıdan bakılmasının örneğidir. Bu bölümde oda darbe cevabının kısmen bilindiği ve gözlem ortamı için tek bir süzgeç tanımlanabileceği durumları göz önüne alınmıştır.Acoustic signals recorded in concerts, meetings or conferences are effected by the room impulse response and noise. Estimating the clean source signals from the observations is referred as the dereverberation problem. If the room impulse responses are known, the problem is non-blind dereverberation problem. In this thesis non-blind dereverberation problem is posed using convex penalty functions, with a convex minimization procedure. The convex minimization problems are solved using iterative methods. Through the thesis sparse nature of the time frequency spectrum is referred. In order to transform the time domain signal to a time frequency spectrum Short Time Fourier Transform is used.Yüksek LisansM.Sc

    SISTEM WAKTU NYATA PENGUJIAN HAFALAN AYAT-AYAT SUCI AL QURAN SECARA EKSPONENSIAL DAN NONSINUSOIDAL

    Get PDF
    Prinsip pengembangan perangkat lunak otomatisasi adalah selaras dengan prinsip-prinsip yang ditekankan pada era revolusi industri 4.0., artinya perangkat  lunak pada era 4.0 sudah mampu meminimasir peran manusia baik dalam pengembangannya, kemampuan belajar, ataupun operasional-operasional teknis lainnya. Untuk menjawab tantangan model 4.0, maka penelitian ini mengajukan sebuah algoritma yang robust dalam menguji hafalan-hafalan Qur’an secara digital. Pendekatan yang digunakan mewakili fungsi basis eksponensial dan non-sinusoidal yaitu berturut-turut Transformasi Mellin dan Walsh. Pengujian melibatkan 100 sampel para penghafal Qur’an, dengan masing masing unjuk kerja, 0,9 ditunjukan oleh penerapan Transformasi Mellin dan 0,82 dihasilkan oleh Transformasi Walsh. Untuk pengembangan sistem yang lebih tinggi keakuratannya dan lebih efisiensi pemakaian memori yang digunakan, penelitian menghasilkan saran penmbangunan model Fast Transform untuk kedua algoritma

    Multichannel Online Dereverberation based on Spectral Magnitude Inverse Filtering

    Full text link
    This paper addresses the problem of multichannel online dereverberation. The proposed method is carried out in the short-time Fourier transform (STFT) domain, and for each frequency band independently. In the STFT domain, the time-domain room impulse response is approximately represented by the convolutive transfer function (CTF). The multichannel CTFs are adaptively identified based on the cross-relation method, and using the recursive least square criterion. Instead of the complex-valued CTF convolution model, we use a nonnegative convolution model between the STFT magnitude of the source signal and the CTF magnitude, which is just a coarse approximation of the former model, but is shown to be more robust against the CTF perturbations. Based on this nonnegative model, we propose an online STFT magnitude inverse filtering method. The inverse filters of the CTF magnitude are formulated based on the multiple-input/output inverse theorem (MINT), and adaptively estimated based on the gradient descent criterion. Finally, the inverse filtering is applied to the STFT magnitude of the microphone signals, obtaining an estimate of the STFT magnitude of the source signal. Experiments regarding both speech enhancement and automatic speech recognition are conducted, which demonstrate that the proposed method can effectively suppress reverberation, even for the difficult case of a moving speaker.Comment: Paper submitted to IEEE/ACM Transactions on Audio, Speech and Language Processing. IEEE Signal Processing Letters, 201

    An investigation of the utility of monaural sound source separation via nonnegative matrix factorization applied to acoustic echo and reverberation mitigation for hands-free telephony

    Get PDF
    In this thesis we investigate the applicability and utility of Monaural Sound Source Separation (MSSS) via Nonnegative Matrix Factorization (NMF) for various problems related to audio for hands-free telephony. We first investigate MSSS via NMF as an alternative acoustic echo reduction approach to existing approaches such as Acoustic Echo Cancellation (AEC). To this end, we present the single-channel acoustic echo problem as an MSSS problem, in which the objective is to extract the users signal from a mixture also containing acoustic echo and noise. To perform separation, NMF is used to decompose the near-end microphone signal onto the union of two nonnegative bases in the magnitude Short Time Fourier Transform domain. One of these bases is for the spectral energy of the acoustic echo signal, and is formed from the in- coming far-end user’s speech, while the other basis is for the spectral energy of the near-end speaker, and is trained with speech data a priori. In comparison to AEC, the speaker extraction approach obviates Double-Talk Detection (DTD), and is demonstrated to attain its maximal echo mitigation performance immediately upon initiation and to maintain that performance during and after room changes for similar computational requirements. Speaker extraction is also shown to introduce distortion of the near-end speech signal during double-talk, which is quantified by means of a speech distortion measure and compared to that of AEC. Subsequently, we address Double-Talk Detection (DTD) for block-based AEC algorithms. We propose a novel block-based DTD algorithm that uses the available signals and the estimate of the echo signal that is produced by NMF-based speaker extraction to compute a suitably normalized correlation-based decision variable, which is compared to a fixed threshold to decide on doubletalk. Using a standard evaluation technique, the proposed algorithm is shown to have comparable detection performance to an existing conventional block-based DTD algorithm. It is also demonstrated to inherit the room change insensitivity of speaker extraction, with the proposed DTD algorithm generating minimal false doubletalk indications upon initiation and in response to room changes in comparison to the existing conventional DTD. We also show that this property allows its paired AEC to converge at a rate close to the optimum. Another focus of this thesis is the problem of inverting a single measurement of a non- minimum phase Room Impulse Response (RIR). We describe the process by which percep- tually detrimental all-pass phase distortion arises in reverberant speech filtered by the inverse of the minimum phase component of the RIR; in short, such distortion arises from inverting the magnitude response of the high-Q maximum phase zeros of the RIR. We then propose two novel partial inversion schemes that precisely mitigate this distortion. One of these schemes employs NMF-based MSSS to separate the all-pass phase distortion from the target speech in the magnitude STFT domain, while the other approach modifies the inverse minimum phase filter such that the magnitude response of the maximum phase zeros of the RIR is not fully compensated. Subjective listening tests reveal that the proposed schemes generally produce better quality output speech than a comparable inversion technique

    An investigation of the utility of monaural sound source separation via nonnegative matrix factorization applied to acoustic echo and reverberation mitigation for hands-free telephony

    Get PDF
    In this thesis we investigate the applicability and utility of Monaural Sound Source Separation (MSSS) via Nonnegative Matrix Factorization (NMF) for various problems related to audio for hands-free telephony. We first investigate MSSS via NMF as an alternative acoustic echo reduction approach to existing approaches such as Acoustic Echo Cancellation (AEC). To this end, we present the single-channel acoustic echo problem as an MSSS problem, in which the objective is to extract the users signal from a mixture also containing acoustic echo and noise. To perform separation, NMF is used to decompose the near-end microphone signal onto the union of two nonnegative bases in the magnitude Short Time Fourier Transform domain. One of these bases is for the spectral energy of the acoustic echo signal, and is formed from the in- coming far-end user’s speech, while the other basis is for the spectral energy of the near-end speaker, and is trained with speech data a priori. In comparison to AEC, the speaker extraction approach obviates Double-Talk Detection (DTD), and is demonstrated to attain its maximal echo mitigation performance immediately upon initiation and to maintain that performance during and after room changes for similar computational requirements. Speaker extraction is also shown to introduce distortion of the near-end speech signal during double-talk, which is quantified by means of a speech distortion measure and compared to that of AEC. Subsequently, we address Double-Talk Detection (DTD) for block-based AEC algorithms. We propose a novel block-based DTD algorithm that uses the available signals and the estimate of the echo signal that is produced by NMF-based speaker extraction to compute a suitably normalized correlation-based decision variable, which is compared to a fixed threshold to decide on doubletalk. Using a standard evaluation technique, the proposed algorithm is shown to have comparable detection performance to an existing conventional block-based DTD algorithm. It is also demonstrated to inherit the room change insensitivity of speaker extraction, with the proposed DTD algorithm generating minimal false doubletalk indications upon initiation and in response to room changes in comparison to the existing conventional DTD. We also show that this property allows its paired AEC to converge at a rate close to the optimum. Another focus of this thesis is the problem of inverting a single measurement of a non- minimum phase Room Impulse Response (RIR). We describe the process by which percep- tually detrimental all-pass phase distortion arises in reverberant speech filtered by the inverse of the minimum phase component of the RIR; in short, such distortion arises from inverting the magnitude response of the high-Q maximum phase zeros of the RIR. We then propose two novel partial inversion schemes that precisely mitigate this distortion. One of these schemes employs NMF-based MSSS to separate the all-pass phase distortion from the target speech in the magnitude STFT domain, while the other approach modifies the inverse minimum phase filter such that the magnitude response of the maximum phase zeros of the RIR is not fully compensated. Subjective listening tests reveal that the proposed schemes generally produce better quality output speech than a comparable inversion technique
    corecore