2,047 research outputs found

    Speech Enhancement Using An {MMSE} Spectral Amplitude Estimator Based On A Modulation Domain Kalman Filter With A Gamma Prior

    Get PDF
    In this paper, we propose a minimum mean square error spectral estimator for clean speech spectral amplitudes that uses a Kalman filter to model the temporal dynamics of the spectral amplitudes in the modulation domain. Using a two-parameter Gamma distribution to model the prior distribution of the speech spectral amplitudes, we derive closed form expressions for the posterior mean and variance of the spectral amplitudes as well as for the associated update step of the Kalman filter. The performance of the proposed algorithm is evaluated on the TIMIT core test set using the perceptual evaluation of speech quality (PESQ) measure and segmental SNR measure and is shown to give a consistent improvement over a wide range of SNRs when compared to competitive algorithms

    Distributed Multichannel Speech Enhancement Based on Perceptually-Motivated Bayesian Estimators of the Spectral Amplitude

    Get PDF
    In this study, the authors propose multichannel weighted Euclidean (WE) and weighted cosh (WCOSH) cost function estimators for speech enhancement in the distributed microphone scenario. The goal of the work is to illustrate the advantages of utilising additional microphones and modified cost functions for improving signal-to-noise ratio (SNR) and segmental SNR (SSNR) along with log-likelihood ratio (LLR) and perceptual evaluation of speech quality (PESQ) objective metrics over the corresponding single-channel baseline estimators. As with their single-channel counterparts, the perceptually-motivated multichannel WE and WCOSH estimators are functions of a weighting law parameter, which influences attention of the noisy spectral amplitude through a spectral gain function, emphasises spectral peak (formant) information, and accounts for auditory masking effects. Based on the simulation results, the multichannel WE and WCOSH cost function estimators produced gains in SSNR improvement, LLR output and PESQ output over the single-channel baseline results and unweighted cost functions with the best improvements occurring with negative values of the weighting law parameter across all input SNR levels and noise types

    Distributed Multichannel Speech Enhancement with Minimum Mean-square Error Short-time Spectral Amplitude, Log-spectral Amplitude, and Spectral Phase Estimation

    Get PDF
    In this paper, the authors present optimal multichannel frequency domain estimators for minimum mean-square error (MMSE) short-time spectral amplitude (STSA), log-spectral amplitude (LSA), and spectral phase estimation in a widely distributed microphone configuration. The estimators utilize Rayleigh and Gaussian statistical models for the speech prior and noise likelihood with a diffuse noise field for the surrounding environment. Based on the Signal-to-Noise Ratio (SNR) and Segmental Signal-to-Noise Ratio (SSNR) along with the Log-Likelihood Ratio (LLR) and Perceptual Evaluation of Speech Quality (PESQ) as objective metrics, the multichannel LSA estimator decreases background noise and speech distortion and increases speech quality compared to the baseline single channel STSA and LSA estimators, where the optimal multichannel spectral phase estimator serves as a significant quantity to the improvements, and demonstrates robustness due to time alignment and attenuation factor estimation. Overall, the optimal distributed microphone spectral estimators show strong results in noisy environments with application to many consumer, industrial, and military products
    • …
    corecore