1,035 research outputs found
On pre-image iterations for speech enhancement
In this paper, we apply kernel PCA for speech enhancement and derive pre-image iterations for speech enhancement. Both methods make use of a Gaussian kernel. The kernel variance serves as tuning parameter that has to be adapted according to the SNR and the desired degree of de-noising. We develop a method to derive a suitable value for the kernel variance from a noise estimate to adapt pre-image iterations to arbitrary SNRs. In experiments, we compare the performance of kernel PCA and pre-image iterations in terms of objective speech quality measures and automatic speech recognition. The speech data is corrupted by white and colored noise at 0, 5, 10, and 15 dB SNR. As a benchmark, we provide results of the generalized subspace method, of spectral subtraction, and of the minimum mean-square error log-spectral amplitude estimator. In terms of the scores of the PEASS (Perceptual Evaluation Methods for Audio Source Separation) toolbox, the proposed methods achieve a similar performance as the reference methods. The speech recognition experiments show that the utterances processed by pre-image iterations achieve a consistently better word recognition accuracy than the unprocessed noisy utterances and than the utterances processed by the generalized subspace method
Coding Strategies for Cochlear Implants Under Adverse Environments
Cochlear implants are electronic prosthetic devices that restores partial hearing in patients with severe to profound hearing loss. Although most coding strategies have significantly improved the perception of speech in quite listening conditions, there remains limitations on speech perception under adverse environments such as in background noise, reverberation and band-limited channels, and we propose strategies that improve the intelligibility of speech transmitted over the telephone networks, reverberated speech and speech in the presence of background noise. For telephone processed speech, we propose to examine the effects of adding low-frequency and high- frequency information to the band-limited telephone speech. Four listening conditions were designed to simulate the receiving frequency characteristics of telephone handsets. Results indicated improvement in cochlear implant and bimodal listening when telephone speech was augmented with high frequency information and therefore this study provides support for design of algorithms to extend the bandwidth towards higher frequencies. The results also indicated added benefit from hearing aids for bimodal listeners in all four types of listening conditions. Speech understanding in acoustically reverberant environments is always a difficult task for hearing impaired listeners. Reverberated sounds consists of direct sound, early reflections and late reflections. Late reflections are known to be detrimental to speech intelligibility. In this study, we propose a reverberation suppression strategy based on spectral subtraction to suppress the reverberant energies from late reflections. Results from listening tests for two reverberant conditions (RT60 = 0.3s and 1.0s) indicated significant improvement when stimuli was processed with SS strategy. The proposed strategy operates with little to no prior information on the signal and the room characteristics and therefore, can potentially be implemented in real-time CI speech processors. For speech in background noise, we propose a mechanism underlying the contribution of harmonics to the benefit of electroacoustic stimulations in cochlear implants. The proposed strategy is based on harmonic modeling and uses synthesis driven approach to synthesize the harmonics in voiced segments of speech. Based on objective measures, results indicated improvement in speech quality. This study warrants further work into development of algorithms to regenerate harmonics of voiced segments in the presence of noise
Group-Sparse Signal Denoising: Non-Convex Regularization, Convex Optimization
Convex optimization with sparsity-promoting convex regularization is a
standard approach for estimating sparse signals in noise. In order to promote
sparsity more strongly than convex regularization, it is also standard practice
to employ non-convex optimization. In this paper, we take a third approach. We
utilize a non-convex regularization term chosen such that the total cost
function (consisting of data consistency and regularization terms) is convex.
Therefore, sparsity is more strongly promoted than in the standard convex
formulation, but without sacrificing the attractive aspects of convex
optimization (unique minimum, robust algorithms, etc.). We use this idea to
improve the recently developed 'overlapping group shrinkage' (OGS) algorithm
for the denoising of group-sparse signals. The algorithm is applied to the
problem of speech enhancement with favorable results in terms of both SNR and
perceptual quality.Comment: 14 pages, 11 figure
Robust Estimation of Non-Stationary Noise Power Spectrum for Speech Enhancement
International audienceWe propose a novel method for noise power spectrum estimation in speech enhancement. This method called extended-DATE (E-DATE) extends the d-dimensional amplitude trimmed estimator (DATE), originally introduced for additive white gaussian noise power spectrum estimation, to the more challenging scenario of non-stationary noise. The key idea is that, in each frequency bin and within a sufficiently short time period, the noise instantaneous power spectrum can be considered as approximately constant and estimated as the variance of a complex gaussian noise process possibly observed in the presence of the signal of interest. The proposed method relies on the fact that the Short-Time Fourier Transform (STFT) of noisy speech signals is sparse in the sense that transformed speech signals can be represented by a relatively small number of coefficients with large amplitudes in the time-frequency domain. The E-DATE estimator is robust in that it does not require prior information about the signal probability distribution except for the weak-sparseness property. In comparison to other state-of-the-art methods, the E-DATE is found to require the smallest number of parameters (only two). The performance of the proposed estimator has been evaluated in combination with noise reduction and compared to alternative methods. This evaluation involves objective as well as pseudo-subjective criteria
Translation-Invariant Shrinkage/Thresholding of Group Sparse Signals
This paper addresses signal denoising when large-amplitude coefficients form
clusters (groups). The L1-norm and other separable sparsity models do not
capture the tendency of coefficients to cluster (group sparsity). This work
develops an algorithm, called 'overlapping group shrinkage' (OGS), based on the
minimization of a convex cost function involving a group-sparsity promoting
penalty function. The groups are fully overlapping so the denoising method is
translation-invariant and blocking artifacts are avoided. Based on the
principle of majorization-minimization (MM), we derive a simple iterative
minimization algorithm that reduces the cost function monotonically. A
procedure for setting the regularization parameter, based on attenuating the
noise to a specified level, is also described. The proposed approach is
illustrated on speech enhancement, wherein the OGS approach is applied in the
short-time Fourier transform (STFT) domain. The denoised speech produced by OGS
does not suffer from musical noise.Comment: 33 pages, 7 figures, 5 table
- …