2,734 research outputs found

    Robust equalization of multichannel acoustic systems

    Get PDF
    In most real-world acoustical scenarios, speech signals captured by distant microphones from a source are reverberated due to multipath propagation, and the reverberation may impair speech intelligibility. Speech dereverberation can be achieved by equalizing the channels from the source to microphones. Equalization systems can be computed using estimates of multichannel acoustic impulse responses. However, the estimates obtained from system identification always include errors; the fact that an equalization system is able to equalize the estimated multichannel acoustic system does not mean that it is able to equalize the true system. The objective of this thesis is to propose and investigate robust equalization methods for multichannel acoustic systems in the presence of system identification errors. Equalization systems can be computed using the multiple-input/output inverse theorem or multichannel least-squares method. However, equalization systems obtained from these methods are very sensitive to system identification errors. A study of the multichannel least-squares method with respect to two classes of characteristic channel zeros is conducted. Accordingly, a relaxed multichannel least- squares method is proposed. Channel shortening in connection with the multiple- input/output inverse theorem and the relaxed multichannel least-squares method is discussed. Two algorithms taking into account the system identification errors are developed. Firstly, an optimally-stopped weighted conjugate gradient algorithm is proposed. A conjugate gradient iterative method is employed to compute the equalization system. The iteration process is stopped optimally with respect to system identification errors. Secondly, a system-identification-error-robust equalization method exploring the use of error models is presented, which incorporates system identification error models in the weighted multichannel least-squares formulation

    Stereophonic acoustic echo cancellation employing selective-tap adaptive algorithms

    No full text
    Published versio

    Efficient Synthesis of Room Acoustics via Scattering Delay Networks

    Get PDF
    An acoustic reverberator consisting of a network of delay lines connected via scattering junctions is proposed. All parameters of the reverberator are derived from physical properties of the enclosure it simulates. It allows for simulation of unequal and frequency-dependent wall absorption, as well as directional sources and microphones. The reverberator renders the first-order reflections exactly, while making progressively coarser approximations of higher-order reflections. The rate of energy decay is close to that obtained with the image method (IM) and consistent with the predictions of Sabine and Eyring equations. The time evolution of the normalized echo density, which was previously shown to be correlated with the perceived texture of reverberation, is also close to that of IM. However, its computational complexity is one to two orders of magnitude lower, comparable to the computational complexity of a feedback delay network (FDN), and its memory requirements are negligible

    Blind MultiChannel Identification and Equalization for Dereverberation and Noise Reduction based on Convolutive Transfer Function

    Get PDF
    This paper addresses the problems of blind channel identification and multichannel equalization for speech dereverberation and noise reduction. The time-domain cross-relation method is not suitable for blind room impulse response identification, due to the near-common zeros of the long impulse responses. We extend the cross-relation method to the short-time Fourier transform (STFT) domain, in which the time-domain impulse responses are approximately represented by the convolutive transfer functions (CTFs) with much less coefficients. The CTFs suffer from the common zeros caused by the oversampled STFT. We propose to identify CTFs based on the STFT with the oversampled signals and the critical sampled CTFs, which is a good compromise between the frequency aliasing of the signals and the common zeros problem of CTFs. In addition, a normalization of the CTFs is proposed to remove the gain ambiguity across sub-bands. In the STFT domain, the identified CTFs is used for multichannel equalization, in which the sparsity of speech signals is exploited. We propose to perform inverse filtering by minimizing the 1\ell_1-norm of the source signal with the relaxed 2\ell_2-norm fitting error between the micophone signals and the convolution of the estimated source signal and the CTFs used as a constraint. This method is advantageous in that the noise can be reduced by relaxing the 2\ell_2-norm to a tolerance corresponding to the noise power, and the tolerance can be automatically set. The experiments confirm the efficiency of the proposed method even under conditions with high reverberation levels and intense noise.Comment: 13 pages, 5 figures, 5 table

    System Identification with Applications in Speech Enhancement

    No full text
    As the increasing popularity of integrating hands-free telephony on mobile portable devices and the rapid development of voice over internet protocol, identification of acoustic systems has become desirable for compensating distortions introduced to speech signals during transmission, and hence enhancing the speech quality. The objective of this research is to develop system identification algorithms for speech enhancement applications including network echo cancellation and speech dereverberation. A supervised adaptive algorithm for sparse system identification is developed for network echo cancellation. Based on the framework of selective-tap updating scheme on the normalized least mean squares algorithm, the MMax and sparse partial update tap-selection strategies are exploited in the frequency domain to achieve fast convergence performance with low computational complexity. Through demonstrating how the sparseness of the network impulse response varies in the transformed domain, the multidelay filtering structure is incorporated to reduce the algorithmic delay. Blind identification of SIMO acoustic systems for speech dereverberation in the presence of common zeros is then investigated. First, the problem of common zeros is defined and extended to include the presence of near-common zeros. Two clustering algorithms are developed to quantify the number of these zeros so as to facilitate the study of their effect on blind system identification and speech dereverberation. To mitigate such effect, two algorithms are developed where the two-stage algorithm based on channel decomposition identifies common and non-common zeros sequentially; and the forced spectral diversity approach combines spectral shaping filters and channel undermodelling for deriving a modified system that leads to an improved dereverberation performance. Additionally, a solution to the scale factor ambiguity problem in subband-based blind system identification is developed, which motivates further research on subbandbased dereverberation techniques. Comprehensive simulations and discussions demonstrate the effectiveness of the aforementioned algorithms. A discussion on possible directions of prospective research on system identification techniques concludes this thesis

    Multichannel Speech Separation and Enhancement Using the Convolutive Transfer Function

    Get PDF
    This paper addresses the problem of speech separation and enhancement from multichannel convolutive and noisy mixtures, \emph{assuming known mixing filters}. We propose to perform the speech separation and enhancement task in the short-time Fourier transform domain, using the convolutive transfer function (CTF) approximation. Compared to time-domain filters, CTF has much less taps, consequently it has less near-common zeros among channels and less computational complexity. The work proposes three speech-source recovery methods, namely: i) the multichannel inverse filtering method, i.e. the multiple input/output inverse theorem (MINT), is exploited in the CTF domain, and for the multi-source case, ii) a beamforming-like multichannel inverse filtering method applying single source MINT and using power minimization, which is suitable whenever the source CTFs are not all known, and iii) a constrained Lasso method, where the sources are recovered by minimizing the 1\ell_1-norm to impose their spectral sparsity, with the constraint that the 2\ell_2-norm fitting cost, between the microphone signals and the mixing model involving the unknown source signals, is less than a tolerance. The noise can be reduced by setting a tolerance onto the noise power. Experiments under various acoustic conditions are carried out to evaluate the three proposed methods. The comparison between them as well as with the baseline methods is presented.Comment: Submitted to IEEE/ACM Transactions on Audio, Speech and Language Processin

    Sparseness-controlled adaptive algorithms for supervised and unsupervised system identification

    No full text
    In single-channel hands-free telephony, the acoustic coupling between the loudspeaker and the microphone can be strong and this generates echoes that can degrade user experience. Therefore, effective acoustic echo cancellation (AEC) is necessary to maintain a stable system and hence improve the perceived voice quality of a call. Traditionally, adaptive filters have been deployed in acoustic echo cancellers to estimate the acoustic impulse responses (AIRs) using adaptive algorithms. The performances of a range of well-known algorithms are studied in the context of both AEC and network echo cancellation (NEC). It presents insights into their tracking performances under both time-invariant and time-varying system conditions. In the context of AEC, the level of sparseness in AIRs can vary greatly in a mobile environment. When the response is strongly sparse, convergence of conventional approaches is poor. Drawing on techniques originally developed for NEC, a class of time-domain and a frequency-domain AEC algorithms are proposed that can not only work well in both sparse and dispersive circumstances, but also adapt dynamically to the level of sparseness using a new sparseness-controlled approach. As it will be shown later that the early part of the acoustic echo path is sparse while the late reverberant part of the acoustic path is dispersive, a novel approach to an adaptive filter structure that consists of two time-domain partition blocks is proposed such that different adaptive algorithms can be used for each part. By properly controlling the mixing parameter for the partitioned blocks separately, where the block lengths are controlled adaptively, the proposed partitioned block algorithm works well in both sparse and dispersive time-varying circumstances. A new insight into an analysis on the tracking performance of improved proportionate NLMS (IPNLMS) is presented by deriving the expression for the mean-square error. By employing the framework for both sparse and dispersive time-varying echo paths, this work validates the analytic results in practical simulations for AEC. The time-domain second-order statistic based blind SIMO identification algorithms, which exploit the cross relation method, are investigated and then a technique with proportionate step-size control for both sparse and dispersive system identification is also developed

    Multichannel active control of random noise in a small reverberant room

    Get PDF
    corecore