    Single Channel Speech Enhancement using Kalman Filter

    The quality and intelligibility of speech conversation are generally degraded by the surrounding noises. The main objective of speech enhancement (SE) is to eliminate or reduce such disturbing noises from the degraded speech. Various SE methods have been proposed in literature. Among them, the Kalman filter (KF) is known to be an efficient SE method that uses the minimum mean square error (MMSE). However, most of the conventional KF based speech enhancement methods need access to clean speech and additive noise information for the state-space model parameters, namely, the linear prediction coefficients (LPCs) and the additive noise variance estimation, which is impractical in the sense that in practice, we can access only the noisy speech. Moreover, it is quite difficult to estimate these model parameters efficiently in the presence of adverse environmental noises. Therefore, the main focus of this thesis is to develop single channel speech enhancement algorithms using Kalman filter, where the model parameters are estimated in noisy conditions. Depending on these parameter estimation techniques, the proposed SE methods are classified into three approaches based on non-iterative, iterative, and sub-band iterative KF. In the first approach, a non-iterative Kalman filter based speech enhancement algorithm is presented, which operates on a frame-by-frame basis. In this proposed method, the state-space model parameters, namely, the LPCs and noise variance, are estimated first in noisy conditions. For LPC estimation, a combined speech smoothing and autocorrelation method is employed. A new method based on a lower-order truncated Taylor series approximation of the noisy speech along with a difference operation serving as high-pass filtering is introduced for the noise variance estimation. The non-iterative Kalman filter is then implemented with these estimated parameters effectively. In order to enhance the SE performance as well as parameter estimation accuracy in noisy conditions, an iterative Kalman filter based single channel SE method is proposed as the second approach, which also operates on a frame-by-frame basis. For each frame, the state-space model parameters of the KF are estimated through an iterative procedure. The Kalman filtering iteration is first applied to each noisy speech frame, reducing the noise component to a certain degree. At the end of this first iteration, the LPCs and other state-space model parameters are re-estimated using the processed speech frame and the Kalman filtering is repeated for the same processed frame. This iteration continues till the KF converges or a maximum number of iterations is reached, giving further enhanced speech frame. The same procedure will repeat for the following frames until the last noisy speech frame being processed. For further improving the speech enhancement performance, a sub-band iterative Kalman filter based SE method is also proposed as the third approach. A wavelet filter-bank is first used to decompose the noisy speech into a number of sub-bands. To achieve the best trade-off among the noise reduction, speech intelligibility and computational complexity, a partial reconstruction scheme based on consecutive mean squared error (CMSE) is proposed to synthesize the low-frequency (LF) and highfrequency (HF) sub-bands such that the iterative KF is employed only to the partially reconstructed HF sub-band speech. Finally, the enhanced HF sub-band speech is combined with the partially reconstructed LF sub-band speech to reconstruct the full-band enhanced speech. Experimental results have shown that the proposed KF based SE methods are capable of reducing adverse environmental noises for a wide range of input SNRs, and the overall performance of the proposed methods in terms of different evaluation metrics is superior to some existing state-of-the art SE methods

    DNN-Assisted Speech Enhancement Approaches Incorporating Phase Information

    Speech enhancement is a widely adopted technique that removes the interferences in a corrupted speech to improve the speech quality and intelligibility. Speech enhancement methods can be implemented in either time domain or time-frequency (T-F) domain. Among various proposed methods, the time-frequency domain methods, which synthesize the enhanced speech with the estimated magnitude spectrogram and the noisy phase spectrogram, gain the most popularity in the past few decades. However, this kind of techniques tend to ignore the importance of phase processing. To overcome this problem, the thesis aims to jointly enhance the magnitude and phase spectra by means of the most recent deep neural networks (DNNs). More specifically, three major contributions are presented in this thesis. First, we present new schemes based on the basic Kalman filter (KF) to remove the background noise in the noisy speech in time domain, where the KF acts as joint estimator for both the magnitude and phase spectra of speech. A DNN-augmented basic KF is first proposed, where DNN is applied for estimating key parameters in the KF, namely the linear prediction coefficients (LPCs). By training the DNN with a large database and making use of the powerful learning ability of DNN, the proposed algorithm is able to estimate LPCs from noisy speech more accurately and robustly, leading to an improved performance as compared to traditional KF based approaches in speech enhancement. We further present a high-frequency (HF) component restoration algorithm to extenuate the degradation in the HF regions of the Kalman-filtered speech, in which the DNN-based bandwidth extension is applied to estimate the magnitude of HF component from the low-frequency (LF) counterpart. By incorporating the restoration algorithm, the enhanced speech suffers less distortion in the HF component. Moreover, we propose a hybrid speech enhancement system that exploits DNN for speech reconstruction and Kalman filtering for further denoising. Two separate networks are adopted in the estimation of magnitude spectrogram and LPCs of the clean speech, respectively. The estimated clean magnitude spectrogram is combined with the phase of the noisy speech to reconstruct the estimated clean speech. A KF with the estimated parameters is then utilized to remove the residual noise in the reconstructed speech. The proposed hybrid system takes advantages of both the DNN-based reconstruction and traditional Kalman filtering, and can work reliably in either matched or unmatched acoustic environments. Next, we incorporate the DNN-based parameter estimation scheme in two advanced KFs: subband KF and colored-noise KF. The DNN-augmented subband KF method decomposes the noisy speech into several subbands, and performs Kalman filtering to each subband speech, where the parameters of the KF are estimated by the trained DNN. The final enhanced speech is then obtained by synthesizing the enhanced subband speeches. In the DNN-augmented colored-noise KF system, both clean speech and noise are modelled as autoregressive (AR) processes, whose parameters comprise the LPCs and the driving noise variances. The LPCs are obtained through training a multi-objective DNN, while the driving noise variances are obtained by solving an optimization problem aiming to minimize the difference between the modelled and observed AR spectra of the noisy speech. The colored-noise Kalman filter with DNN-estimated parameters is then applied to the noisy speech for denoising. A post-subtraction technique is adopted to further remove the residual noise in the Kalman-filtered speech. Extensive computer simulations show that the two proposed advanced KF systems achieve significant performance gains when compared to conventional Kalman filter based algorithms as well as recent DNN-based methods under both seen and unseen noise conditions. Finally, we focus on the T-F domain speech enhancement with masking technique, which aims to retain the speech dominant components and suppress the noise dominant parts of the noisy speech. We first derive a new type of mask, namely constrained ratio mask (CRM), to better control the trade-off between speech distortion and residual noise in the enhanced speech. The CRM is estimated with a trained DNN based on the input noisy feature set and is applied to the noisy magnitude spectrogram for denoising. We further extend the CRM to the complex spectrogram estimation, where the enhanced magnitude spectrogram is obtained with the CRM, while the estimated phase spectrogram is reconstructed with the noisy phase spectrogram and the phase derivatives. Performance evaluation reveals our proposed CRM outperforms several traditional masks in terms of objective metrics. Moreover, the enhanced speech resulting from the CRM based complex spectrogram estimation has a better speech quality than that obtained without using phase reconstruction

    Noise Reduction with Optimal Variable Span Linear Filters

    A study on different linear and non-linear filtering techniques of speech and speech recognition

    In any signal noise is an undesired quantity, however most of thetime every signal get mixed with noise at different levels of theirprocessing and application, due to which the information containedby the signal gets distorted and makes the whole signal redundant.A speech signal is very prominent with acoustical noises like bubblenoise, car noise, street noise etc. So for removing the noises researchershave developed various techniques which are called filtering. Basicallyall the filtering techniques are not suitable for every application,hence based on the type of application some techniques are betterthan the others. Broadly, the filtering techniques can be classifiedinto two categories i.e. linear filtering and non-linear filtering.In this paper a study is presented on some of the filtering techniqueswhich are based on linear and nonlinear approaches. These techniquesincludes different adaptive filtering based on algorithm like LMS,NLMS and RLS etc., Kalman filter, ARMA and NARMA time series applicationfor filtering, neural networks combine with fuzzy i.e. ANFIS. Thispaper also includes the application of various features i.e. MFCC,LPC, PLP and gamma for filtering and recognition

    Estimation of Autoregressive Parameters from Noisy Observations Using Iterated Covariance Updates

    Estimating the parameters of the autoregressive (AR) random process is a problem that has been well-studied. In many applications, only noisy measurements of AR process are available. The effect of the additive noise is that the system can be modeled as an AR model with colored noise, even when the measurement noise is white, where the correlation matrix depends on the AR parameters. Because of the correlation, it is expedient to compute using multiple stacked observations. Performing a weighted least-squares estimation of the AR parameters using an inverse covariance weighting can provide significantly better parameter estimates, with improvement increasing with the stack depth. The estimation algorithm is essentially a vector RLS adaptive filter, with time-varying covariance matrix. Different ways of estimating the unknown covariance are presented, as well as a method to estimate the variances of the AR and observation noise. The notation is extended to vector autoregressive (VAR) processes. Simulation results demonstrate performance improvements in coefficient error and in spectrum estimation

    Optimum Median Filter Based on Crow Optimization Algorithm

    يُقترح مرشح متوسط ​​جديد يعتمد على خوارزميات تحسين الغراب (OMF) لتقليل ضوضاء الملح والفلفل العشوائية وتحسين جودة الصور ذات اللون الرمادي والملونة . الفكرة الرئيسية لهذا النهج هي أن أولاً ، تقوم خوارزمية تحسين الأداء بالكشف عن وحدات البكسل الخاصة بالضوضاء ، واستبدالها بقيمة وسيطة مثالية تبعًا لدالة الأداء. أخيرًا ، تم استخدام نسبة القياس القصوى لنسبة الإشارة إلى الضوضاء (PSNR) ، والتشابه الهيكلي والخطأ المربع المطلق والخطأ التربيعي المتوسط ​​لاختبار أداء المرشحات المقترحة (المرشح الوسيط الأصلي والمحسّن) المستخدمة في الكشف عن الضوضاء وإزالتها من الصور. يحقق المحاكاة استنادًا إلى MATLAB R2019b والنتائج الحالية التي تفيد بأن المرشح المتوسط ​​المحسّن مع خوارزمية تحسين الغراب أكثر فعالية من خوارزمية المرشح المتوسط ​​الأصلية ومرشحات لطرق حديثة ؛ أنها تبين أن العملية المقترحة قوية للحد من مشكلة الخطأ وإزالة الضوضاء بسبب مرشح عامل التصفية المتوسط ​​؛ ستظهر النتائج عن طريق تقليل الخطأ التربيعي المتوسط ​​إلى أدنى أو أقل من (1.5) ، والخطأ المطلق للتساوي (0.22) ,والتشابه الهيكلي اكثر من ( 95%) والحصول على PSNR أكثر من 45dB).) وبنسبة تحسين ( 25%) .          A novel median filter based on crow optimization algorithms (OMF) is suggested to reduce the random salt and pepper noise and improve the quality of the RGB-colored and gray images. The fundamental idea of the approach is that first, the crow optimization algorithm detects noise pixels, and that replacing them with an optimum median value depending on a criterion of maximization fitness function. Finally, the standard measure peak signal-to-noise ratio (PSNR), Structural Similarity, absolute square error and mean square error have been used to test the performance of suggested filters (original and improved median filter) used to removed noise from images. It achieves the simulation based on MATLAB R2019b and the results present that the improved median filter with crow optimization algorithm is more effective than the original median filter algorithm and some recently methods; they show that the suggested process is robust to reduce the error problem and remove noise because of a candidate of the median filter; the results will show by the minimized mean square error to equal or less than (1.38), absolute error to equal or less than (0.22) ,Structural Similarity (SSIM) to equal (0.9856) and getting PSNR more than (46 dB). Thus, the percentage of improvement in work is (25%)

    Adaptive Hidden Markov Noise Modelling for Speech Enhancement

    A robust and reliable noise estimation algorithm is required in many speech enhancement systems. The aim of this thesis is to propose and evaluate a robust noise estimation algorithm for highly non-stationary noisy environments. In this work, we model the non-stationary noise using a set of discrete states with each state representing a distinct noise power spectrum. In this approach, the state sequence over time is conveniently represented by a Hidden Markov Model (HMM). In this thesis, we first present an online HMM re-estimation framework that models time-varying noise using a Hidden Markov Model and tracks changes in noise characteristics by a sequential model update procedure that tracks the noise characteristics during the absence of speech. In addition the algorithm will when necessary create new model states to represent novel noise spectra and will merge existing states that have similar characteristics. We then extend our work in robust noise estimation during speech activity by incorporating a speech model into our existing noise model. The noise characteristics within each state are updated based on a speech presence probability which is derived from a modified Minima controlled recursive averaging method. We have demonstrated the effectiveness of our noise HMM in tracking both stationary and highly non-stationary noise, and shown that it gives improved performance over other conventional noise estimation methods when it is incorporated into a standard speech enhancement algorithm