504 research outputs found
Single Channel Speech Enhancement using Kalman Filter
The quality and intelligibility of speech conversation are generally degraded by the
surrounding noises. The main objective of speech enhancement (SE) is to eliminate
or reduce such disturbing noises from the degraded speech. Various SE methods have
been proposed in literature. Among them, the Kalman filter (KF) is known to be an
efficient SE method that uses the minimum mean square error (MMSE). However,
most of the conventional KF based speech enhancement methods need access to clean
speech and additive noise information for the state-space model parameters, namely,
the linear prediction coefficients (LPCs) and the additive noise variance estimation,
which is impractical in the sense that in practice, we can access only the noisy speech.
Moreover, it is quite difficult to estimate these model parameters efficiently in the
presence of adverse environmental noises. Therefore, the main focus of this thesis is to
develop single channel speech enhancement algorithms using Kalman filter, where the
model parameters are estimated in noisy conditions. Depending on these parameter
estimation techniques, the proposed SE methods are classified into three approaches
based on non-iterative, iterative, and sub-band iterative KF.
In the first approach, a non-iterative Kalman filter based speech enhancement
algorithm is presented, which operates on a frame-by-frame basis. In this proposed
method, the state-space model parameters, namely, the LPCs and noise variance, are
estimated first in noisy conditions. For LPC estimation, a combined speech smoothing
and autocorrelation method is employed. A new method based on a lower-order
truncated Taylor series approximation of the noisy speech along with a difference
operation serving as high-pass filtering is introduced for the noise variance estimation.
The non-iterative Kalman filter is then implemented with these estimated parameters
effectively.
In order to enhance the SE performance as well as parameter estimation accuracy
in noisy conditions, an iterative Kalman filter based single channel SE method is
proposed as the second approach, which also operates on a frame-by-frame basis.
For each frame, the state-space model parameters of the KF are estimated through
an iterative procedure. The Kalman filtering iteration is first applied to each noisy
speech frame, reducing the noise component to a certain degree. At the end of this
first iteration, the LPCs and other state-space model parameters are re-estimated
using the processed speech frame and the Kalman filtering is repeated for the same
processed frame. This iteration continues till the KF converges or a maximum number
of iterations is reached, giving further enhanced speech frame. The same procedure
will repeat for the following frames until the last noisy speech frame being processed.
For further improving the speech enhancement performance, a sub-band iterative
Kalman filter based SE method is also proposed as the third approach. A wavelet
filter-bank is first used to decompose the noisy speech into a number of sub-bands.
To achieve the best trade-off among the noise reduction, speech intelligibility and
computational complexity, a partial reconstruction scheme based on consecutive mean
squared error (CMSE) is proposed to synthesize the low-frequency (LF) and highfrequency (HF) sub-bands such that the iterative KF is employed only to the partially
reconstructed HF sub-band speech. Finally, the enhanced HF sub-band speech is
combined with the partially reconstructed LF sub-band speech to reconstruct the
full-band enhanced speech.
Experimental results have shown that the proposed KF based SE methods are
capable of reducing adverse environmental noises for a wide range of input SNRs,
and the overall performance of the proposed methods in terms of different evaluation
metrics is superior to some existing state-of-the art SE methods
DNN-Assisted Speech Enhancement Approaches Incorporating Phase Information
Speech enhancement is a widely adopted technique that removes the interferences in a corrupted speech to improve the speech quality and intelligibility. Speech enhancement methods can be implemented in either time domain or time-frequency (T-F) domain. Among various proposed methods, the time-frequency domain methods, which synthesize the enhanced speech with the estimated magnitude spectrogram and the noisy phase spectrogram, gain the most popularity in the past few decades. However, this kind of techniques tend to ignore the importance of phase processing. To overcome this problem, the thesis aims to jointly enhance the magnitude and phase spectra by means of the most recent deep neural networks (DNNs). More specifically, three major contributions are presented in this thesis.
First, we present new schemes based on the basic Kalman filter (KF) to remove the background noise in the noisy speech in time domain, where the KF acts as joint estimator for both the magnitude and phase spectra of speech. A DNN-augmented basic KF is first proposed, where DNN is applied for estimating key parameters in the KF, namely the linear prediction coefficients (LPCs). By training the DNN with a large database and making use of the powerful learning ability of DNN, the proposed algorithm is able to estimate LPCs from noisy speech more accurately and robustly, leading to an improved performance as compared to traditional KF based approaches in speech enhancement. We further present a high-frequency (HF) component restoration algorithm to extenuate the degradation in the HF regions of the Kalman-filtered speech, in which the DNN-based bandwidth extension is applied to estimate the magnitude of HF component from the low-frequency (LF) counterpart. By incorporating the restoration algorithm, the enhanced speech suffers less distortion in the HF component. Moreover, we propose a hybrid speech enhancement system that exploits DNN for speech reconstruction and Kalman filtering for further denoising. Two separate networks are adopted in the estimation of magnitude spectrogram and LPCs of the clean speech, respectively. The estimated clean magnitude spectrogram is combined with the phase of the noisy speech to reconstruct the estimated clean speech. A KF with the estimated parameters is then utilized to remove the residual noise in the reconstructed speech. The proposed hybrid system takes advantages of both the DNN-based reconstruction and traditional Kalman filtering, and can work reliably in either matched or unmatched acoustic environments.
Next, we incorporate the DNN-based parameter estimation scheme in two advanced KFs: subband KF and colored-noise KF. The DNN-augmented subband KF method decomposes the noisy speech into several subbands, and performs Kalman filtering to each subband speech, where the parameters of the KF are estimated by the trained DNN. The final enhanced speech is then obtained by synthesizing the enhanced subband speeches. In the DNN-augmented colored-noise KF system, both clean speech and noise are modelled as autoregressive (AR) processes, whose parameters comprise the LPCs and the driving noise variances. The LPCs are obtained through training a multi-objective DNN, while the driving noise variances are obtained by solving an optimization problem aiming to minimize the difference between the modelled and observed AR spectra of the noisy speech. The colored-noise Kalman filter with DNN-estimated parameters is then applied
to the noisy speech for denoising. A post-subtraction technique is adopted to further remove the residual noise in the Kalman-filtered speech. Extensive computer simulations show that the two proposed advanced KF systems achieve significant performance gains when compared to conventional Kalman filter based algorithms as well as recent DNN-based methods under both seen and unseen noise conditions.
Finally, we focus on the T-F domain speech enhancement with masking technique, which aims to retain the speech dominant components and suppress the noise dominant parts of the noisy speech. We first derive a new type of mask, namely constrained ratio mask (CRM), to better control the trade-off between speech distortion and residual noise in the enhanced speech. The CRM is estimated with a trained DNN based on the input noisy feature set and is applied to the noisy magnitude spectrogram for denoising. We further extend the CRM to the complex spectrogram estimation, where the enhanced magnitude spectrogram is obtained with the CRM, while the estimated phase spectrogram is reconstructed with the noisy phase spectrogram and the phase derivatives. Performance evaluation reveals our proposed CRM outperforms several traditional masks in terms of objective metrics. Moreover, the enhanced speech resulting from the CRM based complex spectrogram estimation has a better speech quality than that obtained without using phase reconstruction
A study on different linear and non-linear filtering techniques of speech and speech recognition
In any signal noise is an undesired quantity, however most of thetime every signal get mixed with noise at different levels of theirprocessing and application, due to which the information containedby the signal gets distorted and makes the whole signal redundant.A speech signal is very prominent with acoustical noises like bubblenoise, car noise, street noise etc. So for removing the noises researchershave developed various techniques which are called filtering. Basicallyall the filtering techniques are not suitable for every application,hence based on the type of application some techniques are betterthan the others. Broadly, the filtering techniques can be classifiedinto two categories i.e. linear filtering and non-linear filtering.In this paper a study is presented on some of the filtering techniqueswhich are based on linear and nonlinear approaches. These techniquesincludes different adaptive filtering based on algorithm like LMS,NLMS and RLS etc., Kalman filter, ARMA and NARMA time series applicationfor filtering, neural networks combine with fuzzy i.e. ANFIS. Thispaper also includes the application of various features i.e. MFCC,LPC, PLP and gamma for filtering and recognition
Estimation of Autoregressive Parameters from Noisy Observations Using Iterated Covariance Updates
Estimating the parameters of the autoregressive (AR) random process is a problem that has been well-studied. In many applications, only noisy measurements of AR process are available. The effect of the additive noise is that the system can be modeled as an AR model with colored noise, even when the measurement noise is white, where the correlation matrix depends on the AR parameters. Because of the correlation, it is expedient to compute using multiple stacked observations. Performing a weighted least-squares estimation of the AR parameters using an inverse covariance weighting can provide significantly better parameter estimates, with improvement increasing with the stack depth. The estimation algorithm is essentially a vector RLS adaptive filter, with time-varying covariance matrix. Different ways of estimating the unknown covariance are presented, as well as a method to estimate the variances of the AR and observation noise. The notation is extended to vector autoregressive (VAR) processes. Simulation results demonstrate performance improvements in coefficient error and in spectrum estimation
Optimum Median Filter Based on Crow Optimization Algorithm
يُقترح مرشح متوسط جديد يعتمد على خوارزميات تحسين الغراب (OMF) لتقليل ضوضاء الملح والفلفل العشوائية وتحسين جودة الصور ذات اللون الرمادي والملونة . الفكرة الرئيسية لهذا النهج هي أن أولاً ، تقوم خوارزمية تحسين الأداء بالكشف عن وحدات البكسل الخاصة بالضوضاء ، واستبدالها بقيمة وسيطة مثالية تبعًا لدالة الأداء. أخيرًا ، تم استخدام نسبة القياس القصوى لنسبة الإشارة إلى الضوضاء (PSNR) ، والتشابه الهيكلي والخطأ المربع المطلق والخطأ التربيعي المتوسط لاختبار أداء المرشحات المقترحة (المرشح الوسيط الأصلي والمحسّن) المستخدمة في الكشف عن الضوضاء وإزالتها من الصور. يحقق المحاكاة استنادًا إلى MATLAB R2019b والنتائج الحالية التي تفيد بأن المرشح المتوسط المحسّن مع خوارزمية تحسين الغراب أكثر فعالية من خوارزمية المرشح المتوسط الأصلية ومرشحات لطرق حديثة ؛ أنها تبين أن العملية المقترحة قوية للحد من مشكلة الخطأ وإزالة الضوضاء بسبب مرشح عامل التصفية المتوسط ؛ ستظهر النتائج عن طريق تقليل الخطأ التربيعي المتوسط إلى أدنى أو أقل من (1.5) ، والخطأ المطلق للتساوي (0.22) ,والتشابه الهيكلي اكثر من ( 95%) والحصول على PSNR أكثر من 45dB).) وبنسبة تحسين ( 25%) . A novel median filter based on crow optimization algorithms (OMF) is suggested to reduce the random salt and pepper noise and improve the quality of the RGB-colored and gray images. The fundamental idea of the approach is that first, the crow optimization algorithm detects noise pixels, and that replacing them with an optimum median value depending on a criterion of maximization fitness function. Finally, the standard measure peak signal-to-noise ratio (PSNR), Structural Similarity, absolute square error and mean square error have been used to test the performance of suggested filters (original and improved median filter) used to removed noise from images. It achieves the simulation based on MATLAB R2019b and the results present that the improved median filter with crow optimization algorithm is more effective than the original median filter algorithm and some recently methods; they show that the suggested process is robust to reduce the error problem and remove noise because of a candidate of the median filter; the results will show by the minimized mean square error to equal or less than (1.38), absolute error to equal or less than (0.22) ,Structural Similarity (SSIM) to equal (0.9856) and getting PSNR more than (46 dB). Thus, the percentage of improvement in work is (25%)
Adaptive Hidden Markov Noise Modelling for Speech Enhancement
A robust and reliable noise estimation algorithm is required in many speech enhancement
systems. The aim of this thesis is to propose and evaluate a robust noise estimation
algorithm for highly non-stationary noisy environments. In this work, we model the
non-stationary noise using a set of discrete states with each state representing a distinct
noise power spectrum. In this approach, the state sequence over time is conveniently
represented by a Hidden Markov Model (HMM).
In this thesis, we first present an online HMM re-estimation framework that models
time-varying noise using a Hidden Markov Model and tracks changes in noise characteristics
by a sequential model update procedure that tracks the noise characteristics
during the absence of speech. In addition the algorithm will when necessary create new
model states to represent novel noise spectra and will merge existing states that have similar
characteristics. We then extend our work in robust noise estimation during speech
activity by incorporating a speech model into our existing noise model. The noise characteristics
within each state are updated based on a speech presence probability which
is derived from a modified Minima controlled recursive averaging method.
We have demonstrated the effectiveness of our noise HMM in tracking both stationary
and highly non-stationary noise, and shown that it gives improved performance over
other conventional noise estimation methods when it is incorporated into a standard
speech enhancement algorithm
- …