273 research outputs found

    System Identification with Applications in Speech Enhancement

    No full text
    As the increasing popularity of integrating hands-free telephony on mobile portable devices and the rapid development of voice over internet protocol, identification of acoustic systems has become desirable for compensating distortions introduced to speech signals during transmission, and hence enhancing the speech quality. The objective of this research is to develop system identification algorithms for speech enhancement applications including network echo cancellation and speech dereverberation. A supervised adaptive algorithm for sparse system identification is developed for network echo cancellation. Based on the framework of selective-tap updating scheme on the normalized least mean squares algorithm, the MMax and sparse partial update tap-selection strategies are exploited in the frequency domain to achieve fast convergence performance with low computational complexity. Through demonstrating how the sparseness of the network impulse response varies in the transformed domain, the multidelay filtering structure is incorporated to reduce the algorithmic delay. Blind identification of SIMO acoustic systems for speech dereverberation in the presence of common zeros is then investigated. First, the problem of common zeros is defined and extended to include the presence of near-common zeros. Two clustering algorithms are developed to quantify the number of these zeros so as to facilitate the study of their effect on blind system identification and speech dereverberation. To mitigate such effect, two algorithms are developed where the two-stage algorithm based on channel decomposition identifies common and non-common zeros sequentially; and the forced spectral diversity approach combines spectral shaping filters and channel undermodelling for deriving a modified system that leads to an improved dereverberation performance. Additionally, a solution to the scale factor ambiguity problem in subband-based blind system identification is developed, which motivates further research on subbandbased dereverberation techniques. Comprehensive simulations and discussions demonstrate the effectiveness of the aforementioned algorithms. A discussion on possible directions of prospective research on system identification techniques concludes this thesis

    Blind MultiChannel Identification and Equalization for Dereverberation and Noise Reduction based on Convolutive Transfer Function

    Get PDF
    This paper addresses the problems of blind channel identification and multichannel equalization for speech dereverberation and noise reduction. The time-domain cross-relation method is not suitable for blind room impulse response identification, due to the near-common zeros of the long impulse responses. We extend the cross-relation method to the short-time Fourier transform (STFT) domain, in which the time-domain impulse responses are approximately represented by the convolutive transfer functions (CTFs) with much less coefficients. The CTFs suffer from the common zeros caused by the oversampled STFT. We propose to identify CTFs based on the STFT with the oversampled signals and the critical sampled CTFs, which is a good compromise between the frequency aliasing of the signals and the common zeros problem of CTFs. In addition, a normalization of the CTFs is proposed to remove the gain ambiguity across sub-bands. In the STFT domain, the identified CTFs is used for multichannel equalization, in which the sparsity of speech signals is exploited. We propose to perform inverse filtering by minimizing the â„“1\ell_1-norm of the source signal with the relaxed â„“2\ell_2-norm fitting error between the micophone signals and the convolution of the estimated source signal and the CTFs used as a constraint. This method is advantageous in that the noise can be reduced by relaxing the â„“2\ell_2-norm to a tolerance corresponding to the noise power, and the tolerance can be automatically set. The experiments confirm the efficiency of the proposed method even under conditions with high reverberation levels and intense noise.Comment: 13 pages, 5 figures, 5 table

    Propagative identification of SIMO systems using the Mean Differential Cepstrum (MDC)

    Get PDF
    An alternative SIMO system identification technique is presented in this paper, requiring only response measurements. The technique is based on the Mean Differential Cepstrum, but in a different format from the original, and has two solution formulations, which are differentiated by the use of a Taylor series approximation. The identification processes give both the magnitude and phase in a propagative manner, solving in the frequency domain from one frequency to the next. Initial values near zero frequency can be taken from the static stiffness properties (for fixed systems) or inertial properties (for free-free systems). The technique has the advantage of not requiring the assumption of minimum-phase properties for the system being identified, which is successfully demonstrated on simulated minimum and non-minimum phase systems. A discussion on the stability of the two solution formulations is given, together with the results from the application to measurements from an experimental test rig. Since the method is limited to transient inputs, the excitations used are both burst random and impulsive forces in each test scenario

    Near-Common Zeros in Blind Identification of Simo Acoustic Systems

    Full text link
    The common zeros problem for Blind System Identification (BSI) has been well known to degrade the performance of clas-sic BSI algorithms and therefore limits performance of subsequent speech dereverberation. Recently, we have shown that multichannel systems cannot be well identified if near-common zeros are present. In this work, we further study the near-common zeros problem us-ing channel diversity measure. We then investigate the use of forced spectral diversity (FSD) based on a combination of spectral shap-ing filters and effective channel undermodelling. Simulation results show the effectiveness of the proposed approach. Index Terms — blind system identification, near-common zeros, channel identifiability condition, forced spectral diversity 1

    Multiple-Input Multiple-Output (MIMO) blind system identification for operational modal analysis using the Mean Differential Cepstrum (MDC)

    Full text link
    The convenience of Operational Modal Analysis (OMA), over conventional Experimental Modal Analysis (EMA), has seen to its increasing popularity over the last decade for the purpose of evaluating dynamic properties of structures. OMA features an advantage of requiring only output information, which is in tandem with its main drawback of lacking scaled modeshape information. While correctly scaled modeshapes can be assumed under a restrictive assumption of spectrally white inputs, in reality, input spectra are at best broadband in nature. In this thesis, an OMA method for Multiple-Input Multiple-Output (MIMO) applications in mechanical structures is developed. The aim is to separate MIMO responses into a collection of Single-Input Single-Output (SISO) processes (matrix FRF) using cepstral-based methods, under less restrictive and hence more realistic coloured broadband excitation. Existing cepstral curve-fitting techniques can be subsequently applied to give regenerated FRFs with correct relative scaling. This cepstral-based method is based on the matrix Mean Differential Cepstrum (MDC) and operates in the frequency domain. Application of the matrix MDC onto MIMO responses leads to a matrix differential equation which together with the use of finite differences, directly solves or identifies the matrix FRF in a propagative manner. An alternative approach based on whitened MIMO responses can be similarly formulated for the indirect solution of the matrix FRF. Both the direct and indirect approaches can be modified with a Taylor series approximation to give a total of four propagative solution sequences. The method is developed using relatively simple simulated and experimental systems, involving both impulsive and burst random excitations. Detailed analysis of the results is performed using more complicated Single-Input Multiple-Output (SIMO) and MIMO systems, involving both driving and non-driving point measurements. The use of the matrix MDC method together with existing cepstral curve-fitting technique to give correct relative scaling is demonstrated on a simulated MIMO system with coloured inputs. Accurate representation of the actual FRFs is achieved by the matrix MDC technique for SIMO set-ups. In MIMO scenarios, excellent identification was obtained for the case of simulated impulsive input while the experimental and burst random input cases were less favourable. The results show that the matrix MDC technique works in MIMO scenarios, but possible noise-related issues need to be addressed in both experimental and burst random input cases for a more satisfactory identification outcome

    Multichannel Speech Separation and Enhancement Using the Convolutive Transfer Function

    Get PDF
    This paper addresses the problem of speech separation and enhancement from multichannel convolutive and noisy mixtures, \emph{assuming known mixing filters}. We propose to perform the speech separation and enhancement task in the short-time Fourier transform domain, using the convolutive transfer function (CTF) approximation. Compared to time-domain filters, CTF has much less taps, consequently it has less near-common zeros among channels and less computational complexity. The work proposes three speech-source recovery methods, namely: i) the multichannel inverse filtering method, i.e. the multiple input/output inverse theorem (MINT), is exploited in the CTF domain, and for the multi-source case, ii) a beamforming-like multichannel inverse filtering method applying single source MINT and using power minimization, which is suitable whenever the source CTFs are not all known, and iii) a constrained Lasso method, where the sources are recovered by minimizing the â„“1\ell_1-norm to impose their spectral sparsity, with the constraint that the â„“2\ell_2-norm fitting cost, between the microphone signals and the mixing model involving the unknown source signals, is less than a tolerance. The noise can be reduced by setting a tolerance onto the noise power. Experiments under various acoustic conditions are carried out to evaluate the three proposed methods. The comparison between them as well as with the baseline methods is presented.Comment: Submitted to IEEE/ACM Transactions on Audio, Speech and Language Processin
    • …
    corecore