Search CORE

6 research outputs found

Harmonic beamformers for speech enhancement and dereverberation in the time domain

Author: Benesty Jacob
Christensen Mads Græsbøll
Jensen Jesper Rindom
Karimian-Azari Sam
Publication venue: 'Elsevier BV'
Publication date: 01/01/2020
Field of study

VBN

TR-2005006: Integration of Laser Vibrometry with Infrared Video for Multimedia Surveillance Display

Author: Li Weihong
Zhu Zhigang
Publication venue: CUNY Academic Works
Publication date: 01/01/2005
Field of study

City University of New York

A Subspace Approach for Enhancing Speech Corrupted by Colored Noise

Author: Philipos C. Loizou
Yi Hu
Publication venue
Publication date: 01/01/2002
Field of study

A generalized subspace approach is proposed for enhancement of speech corrupted by colored noise. The proposed approach is based on the simultaneous diagonalization of the clean speech and noise covariance matrices, which is shown to be a generalization of the approach proposed by Ephraim and Van Trees for white noise. Objective and subjective measures demonstrated significant improvements over other subspace-based methods when tested with sentences corrupted with speech-shaped noise and multi-talker babble

CiteSeerX

Crossref

Analysis of very low quality speech for mask-based enhancement

Author: Gonzalez Sira
Publication venue: Electrical and Electronic Engineering, Imperial College London
Publication date: 01/12/2013
Field of study

The complexity of the speech enhancement problem has motivated many different solutions. However, most techniques address situations in which the target speech is fully intelligible and the background noise energy is low in comparison with that of the speech. Thus while current enhancement algorithms can improve the perceived quality, the intelligibility of the speech is not increased significantly and may even be reduced. Recent research shows that intelligibility of very noisy speech can be improved by the use of a binary mask, in which a binary weight is applied to each time-frequency bin of the input spectrogram. There are several alternative goals for the binary mask estimator, based either on the Signal-to-Noise Ratio (SNR) of each time-frequency bin or on the speech signal characteristics alone. Our approach to the binary mask estimation problem aims to preserve the important speech cues independently of the noise present by identifying time-frequency regions that contain significant speech energy. The speech power spectrum varies greatly for different types of speech sound. The energy of voiced speech sounds is concentrated in the harmonics of the fundamental frequency while that of unvoiced sounds is, in contrast, distributed across a broad range of frequencies. To identify the presence of speech energy in a noisy speech signal we have therefore developed two detection algorithms. The first is a robust algorithm that identifies voiced speech segments and estimates their fundamental frequency. The second detects the presence of sibilants and estimates their energy distribution. In addition, we have developed a robust algorithm to estimate the active level of the speech. The outputs of these algorithms are combined with other features estimated from the noisy speech to form the input to a classifier which estimates a mask that accurately reflects the time-frequency distribution of speech energy even at low SNR levels. We evaluate a mask-based speech enhancer on a range of speech and noise signals and demonstrate a consistent increase in an objective intelligibility measure with respect to noisy speech.Open Acces

Spiral - Imperial College Digital Repository