57 research outputs found

    A new cascaded spectral subtraction approach for binaural speech dereverberation and its application in source separation

    Get PDF
    In this work we propose a new binaural spectral subtraction method for the suppression of late reverberation. The pro- posed approach is a cascade of three stages. The first two stages exploit distinct observations to model and suppress the late reverberation by deriving a gain function. The musical noise artifacts generated due to the processing at each stage are compensated by smoothing the spectral magnitudes of the weighting gains. The third stage linearly combines the gains obtained from the first two stages and further enhances the binaural signals. The binaural gains, obtained by indepen- dently processing the left and right channel signals are com- bined using a new method. Experiments on real data are per- formed in two contexts: dereverberation-only and joint dere- verberation and source separation. Objective results verify the suitability of the proposed cascaded approach in both the contexts

    Informed algorithms for sound source separation in enclosed reverberant environments

    Get PDF
    While humans can separate a sound of interest amidst a cacophony of contending sounds in an echoic environment, machine-based methods lag behind in solving this task. This thesis thus aims at improving performance of audio separation algorithms when they are informed i.e. have access to source location information. These locations are assumed to be known a priori in this work, for example by video processing. Initially, a multi-microphone array based method combined with binary time-frequency masking is proposed. A robust least squares frequency invariant data independent beamformer designed with the location information is utilized to estimate the sources. To further enhance the estimated sources, binary time-frequency masking based post-processing is used but cepstral domain smoothing is required to mitigate musical noise. To tackle the under-determined case and further improve separation performance at higher reverberation times, a two-microphone based method which is inspired by human auditory processing and generates soft time-frequency masks is described. In this approach interaural level difference, interaural phase difference and mixing vectors are probabilistically modeled in the time-frequency domain and the model parameters are learned through the expectation-maximization (EM) algorithm. A direction vector is estimated for each source, using the location information, which is used as the mean parameter of the mixing vector model. Soft time-frequency masks are used to reconstruct the sources. A spatial covariance model is then integrated into the probabilistic model framework that encodes the spatial characteristics of the enclosure and further improves the separation performance in challenging scenarios i.e. when sources are in close proximity and when the level of reverberation is high. Finally, new dereverberation based pre-processing is proposed based on the cascade of three dereverberation stages where each enhances the twomicrophone reverberant mixture. The dereverberation stages are based on amplitude spectral subtraction, where the late reverberation is estimated and suppressed. The combination of such dereverberation based pre-processing and use of soft mask separation yields the best separation performance. All methods are evaluated with real and synthetic mixtures formed for example from speech signals from the TIMIT database and measured room impulse responses

    Estimation of the Direct-Path Relative Transfer Function for Supervised Sound-Source Localization

    Get PDF
    International audienceThis paper addresses the problem of binaural localization of a single speech source in noisy and reverberant environments. For a given binaural microphone setup, the binaural response corresponding to the direct-path propagation of a single source is a function of the source direction. In practice, this response is contaminated by noise and reverberations. The direct-path relative transfer function (DP-RTF) is defined as the ratio between the direct-path acoustic transfer function of the two channels. We propose a method to estimate the DP-RTF from the noisy and reverberant microphone signals in the short-time Fourier transform domain. First, the convolutive transfer function approximation is adopted to accurately represent the impulse response of the sensors in the STFT domain. Second, the DP-RTF is estimated by using the auto-and cross-power spectral densities at each frequency and over multiple frames. In the presence of stationary noise, an inter-frame spectral subtraction algorithm is proposed, which enables to achieve the estimation of noise-free auto-and cross-power spectral densities. Finally, the estimated DP-RTFs are concatenated across frequencies and used as a feature vector for the localization of speech source. Experiments with both simulated and real data show that the proposed localization method performs well, even under severe adverse acoustic conditions, and outperforms state-of-the-art localization methods under most of the acoustic conditions
    corecore