410 research outputs found

    Blind MultiChannel Identification and Equalization for Dereverberation and Noise Reduction based on Convolutive Transfer Function

    Get PDF
    This paper addresses the problems of blind channel identification and multichannel equalization for speech dereverberation and noise reduction. The time-domain cross-relation method is not suitable for blind room impulse response identification, due to the near-common zeros of the long impulse responses. We extend the cross-relation method to the short-time Fourier transform (STFT) domain, in which the time-domain impulse responses are approximately represented by the convolutive transfer functions (CTFs) with much less coefficients. The CTFs suffer from the common zeros caused by the oversampled STFT. We propose to identify CTFs based on the STFT with the oversampled signals and the critical sampled CTFs, which is a good compromise between the frequency aliasing of the signals and the common zeros problem of CTFs. In addition, a normalization of the CTFs is proposed to remove the gain ambiguity across sub-bands. In the STFT domain, the identified CTFs is used for multichannel equalization, in which the sparsity of speech signals is exploited. We propose to perform inverse filtering by minimizing the â„“1\ell_1-norm of the source signal with the relaxed â„“2\ell_2-norm fitting error between the micophone signals and the convolution of the estimated source signal and the CTFs used as a constraint. This method is advantageous in that the noise can be reduced by relaxing the â„“2\ell_2-norm to a tolerance corresponding to the noise power, and the tolerance can be automatically set. The experiments confirm the efficiency of the proposed method even under conditions with high reverberation levels and intense noise.Comment: 13 pages, 5 figures, 5 table

    Multichannel Speech Separation and Enhancement Using the Convolutive Transfer Function

    Get PDF
    This paper addresses the problem of speech separation and enhancement from multichannel convolutive and noisy mixtures, \emph{assuming known mixing filters}. We propose to perform the speech separation and enhancement task in the short-time Fourier transform domain, using the convolutive transfer function (CTF) approximation. Compared to time-domain filters, CTF has much less taps, consequently it has less near-common zeros among channels and less computational complexity. The work proposes three speech-source recovery methods, namely: i) the multichannel inverse filtering method, i.e. the multiple input/output inverse theorem (MINT), is exploited in the CTF domain, and for the multi-source case, ii) a beamforming-like multichannel inverse filtering method applying single source MINT and using power minimization, which is suitable whenever the source CTFs are not all known, and iii) a constrained Lasso method, where the sources are recovered by minimizing the â„“1\ell_1-norm to impose their spectral sparsity, with the constraint that the â„“2\ell_2-norm fitting cost, between the microphone signals and the mixing model involving the unknown source signals, is less than a tolerance. The noise can be reduced by setting a tolerance onto the noise power. Experiments under various acoustic conditions are carried out to evaluate the three proposed methods. The comparison between them as well as with the baseline methods is presented.Comment: Submitted to IEEE/ACM Transactions on Audio, Speech and Language Processin

    Underwater target detection using multichannel subband adaptive filtering and high-order correlation schemes

    Get PDF
    Includes bibliographical references.In this paper, new pre- and post-processing schemes are developed to process shallow-water sonar data to improve the accuracy of target detection. A multichannel subband adaptive filtering is applied to preprocess the data in order to isolate the potential target returns from the acoustic backscattered signals and improve the signal-to-reverberation ratio. This is done by estimating the time delays associated with the reflections in different subbands. The preprocessed results are then beamformed to generate an image for each ping of the sonar. The testing results on both the simulated and real data revealed the efficiency of this scheme in time-delay estimation and its capability in removing most of the competing reverberations and noise. To improve detection rate while significantly minimizing the incident of false detections, a high-order correlation (HOC) method for postprocessing the beamformed images is then developed. This method determines the consistency in occurrence of the target returns in several consecutive pings. The application of the HOC process to the real beamformed sonar data showed the ability of this method for removing the clutter and at the same time boosting the target returns in several consecutive pings. The algorithm is simple, fast, and easy to implement.This work was supported by the Office of Naval Research (ONR 321TS) under Contract N61331-94-K-0018

    Enhancing the front-end of speaker recognition systems

    Get PDF

    Near-Instantaneously Adaptive HSDPA-Style OFDM Versus MC-CDMA Transceivers for WIFI, WIMAX, and Next-Generation Cellular Systems

    No full text
    Burts-by-burst (BbB) adaptive high-speed downlink packet access (HSDPA) style multicarrier systems are reviewed, identifying their most critical design aspects. These systems exhibit numerous attractive features, rendering them eminently eligible for employment in next-generation wireless systems. It is argued that BbB-adaptive or symbol-by-symbol adaptive orthogonal frequency division multiplex (OFDM) modems counteract the near instantaneous channel quality variations and hence attain an increased throughput or robustness in comparison to their fixed-mode counterparts. Although they act quite differently, various diversity techniques, such as Rake receivers and space-time block coding (STBC) are also capable of mitigating the channel quality variations in their effort to reduce the bit error ratio (BER), provided that the individual antenna elements experience independent fading. By contrast, in the presence of correlated fading imposed by shadowing or time-variant multiuser interference, the benefits of space-time coding erode and it is unrealistic to expect that a fixed-mode space-time coded system remains capable of maintaining a near-constant BER

    Blind Subband Beamforming With Time-Delay Constraints for Moving Source Speech Enhancement

    Full text link

    Comparison of CELP speech coder with a wavelet method

    Get PDF
    This thesis compares the speech quality of Code Excited Linear Predictor (CELP, Federal Standard 1016) speech coder with a new wavelet method to compress speech. The performances of both are compared by performing subjective listening tests. The test signals used are clean signals (i.e. with no background noise), speech signals with room noise and speech signals with artificial noise added. Results indicate that for clean signals and signals with predominantly voiced components the CELP standard performs better than the wavelet method but for signals with room noise the wavelet method performs much better than the CELP. For signals with artificial noise added, the results are mixed depending on the level of artificial noise added with CELP performing better for low level noise added signals and the wavelet method performing better for higher noise levels
    • …
    corecore