78 research outputs found
The use of spectral information in the development of novel techniques for speech-based cognitive load classification
The cognitive load of a user refers to the amount of mental demand imposed on the user when performing a particular task. Estimating the cognitive load (CL) level of the users is necessary to adjust the workload imposed on them accordingly in order to improve task performance. The current speech based CL classification systems are not adequate for commercial use due to their low performance particularly in noisy environments. This thesis proposes many techniques to improve the performance of the speech based cognitive load classification system in both clean and noisy conditions.
This thesis analyses and presents the effectiveness of speech features such as spectral centroid frequency (SCF) and spectral centroid amplitude (SCA) for CL classification. Sub-systems based on SCF and SCA features were developed and fused with the traditional Mel frequency cepstral coefficients (MFCC) based system, producing an 8.9% and 31.5% relative error rate reduction respectively when compared to the MFCC-based system alone. The Stroop test corpus was used in these experiments.
The investigation into cognitive load information in the form of spectral distribution in different subbands shows that the information distributed in the low frequency subband is significantly higher than the high frequency subband. Two different methods are proposed to utilize this finding. The first method, called the multi-band approach, uses a weighting scheme to emphasize the speech features in low frequency subbands. The cognitive load classification accuracy of this approach is shown to be higher than a system based on a non-weighting scheme. The second method is to design an effective filterbank based on the spectral distribution of cognitive load information using the Kullback-Leibler distance measure. It is shown that the designed filterbank consistently provides higher classification accuracies than other existing filterbanks such as mel, Bark, and equivalent rectangular bandwidth.
A discrete cosine transform based speech enhancement technique is proposed in order to increase the robustness of the CL classification system and found to be more suitable than other methods investigated. This proposed method provides a 3.0% average relative error rate reduction for the seven types of noise and five levels of SNR used. In particular, it provides a maximum of 7.5% relative error rate reduction for the F16 noise (in NOISEX-92 database) at 20 dB SNR
Speech Enhancement with Adaptive Thresholding and Kalman Filtering
Speech enhancement has been extensively studied for many years and various speech enhance- ment methods have been developed during the past decades. One of the objectives of speech en- hancement is to provide high-quality speech communication in the presence of background noise and concurrent interference signals. In the process of speech communication, the clean speech sig- nal is inevitably corrupted by acoustic noise from the surrounding environment, transmission media, communication equipment, electrical noise, other speakers, and other sources of interference. These disturbances can significantly degrade the quality and intelligibility of the received speech signal. Therefore, it is of great interest to develop efficient speech enhancement techniques to recover the original speech from the noisy observation. In recent years, various techniques have been developed to tackle this problem, which can be classified into single channel and multi-channel enhancement approaches. Since single channel enhancement is easy to implement, it has been a significant field of research and various approaches have been developed. For example, spectral subtraction and Wiener filtering, are among the earliest single channel methods, which are based on estimation of the power spectrum of stationary noise. However, when the noise is non-stationary, or there exists music noise and ambient speech noise, the enhancement performance would degrade considerably. To overcome this disadvantage, this thesis focuses on single channel speech enhancement under adverse noise environment, especially the non-stationary noise environment.
Recently, wavelet transform based methods have been widely used to reduce the undesired background noise. On the other hand, the Kalman filter (KF) methods offer competitive denoising results, especially in non-stationary environment. It has been used as a popular and powerful tool for speech enhancement during the past decades. In this regard, a single channel wavelet thresholding based Kalman filter (KF) algorithm is proposed for speech enhancement in this thesis. The wavelet packet (WP) transform is first applied to the noise corrupted speech on a frame-by-frame basis, which decomposes each frame into a number of subbands. A voice activity detector (VAD) is then designed to detect the voiced/unvoiced frames of the subband speech. Based on the VAD result, an adaptive thresholding scheme is applied to each subband speech followed by the WP based reconstruction to obtain the pre-enhanced speech. To achieve a further level of enhancement, an iterative Kalman filter (IKF) is used to process the pre-enhanced speech.
The proposed adaptive thresholding iterative Kalman filtering (AT-IKF) method is evaluated and compared with some existing methods under various noise conditions in terms of segmental SNR and perceptual evaluation of speech quality (PESQ) as two well-known performance indexes. Firstly, we compare the proposed adaptive thresholding (AT) scheme with three other threshold- ing schemes: the non-linear universal thresholding (U-T), the non-linear wavelet packet transform thresholding (WPT-T) and the non-linear SURE thresholding (SURE-T). The experimental results show that the proposed AT scheme can significantly improve the segmental SNR and PESQ for all input SNRs compared with the other existing thresholding schemes. Secondly, extensive computer simulations are conducted to evaluate the proposed AT-IKF as opposed to the AT and the IKF as standalone speech enhancement methods. It is shown that the AT-IKF method still performs the best. Lastly, the proposed ATIKF method is compared with three representative and popular meth- ods: the improved spectral subtraction based speech enhancement algorithm (ISS), the improved Wiener filter based method (IWF) and the representative subband Kalman filter based algorithm (SIKF). Experimental results demonstrate the effectiveness of the proposed method as compared to some previous works both in terms of segmental SNR and PESQ
Single Channel Speech Enhancement using Kalman Filter
The quality and intelligibility of speech conversation are generally degraded by the
surrounding noises. The main objective of speech enhancement (SE) is to eliminate
or reduce such disturbing noises from the degraded speech. Various SE methods have
been proposed in literature. Among them, the Kalman filter (KF) is known to be an
efficient SE method that uses the minimum mean square error (MMSE). However,
most of the conventional KF based speech enhancement methods need access to clean
speech and additive noise information for the state-space model parameters, namely,
the linear prediction coefficients (LPCs) and the additive noise variance estimation,
which is impractical in the sense that in practice, we can access only the noisy speech.
Moreover, it is quite difficult to estimate these model parameters efficiently in the
presence of adverse environmental noises. Therefore, the main focus of this thesis is to
develop single channel speech enhancement algorithms using Kalman filter, where the
model parameters are estimated in noisy conditions. Depending on these parameter
estimation techniques, the proposed SE methods are classified into three approaches
based on non-iterative, iterative, and sub-band iterative KF.
In the first approach, a non-iterative Kalman filter based speech enhancement
algorithm is presented, which operates on a frame-by-frame basis. In this proposed
method, the state-space model parameters, namely, the LPCs and noise variance, are
estimated first in noisy conditions. For LPC estimation, a combined speech smoothing
and autocorrelation method is employed. A new method based on a lower-order
truncated Taylor series approximation of the noisy speech along with a difference
operation serving as high-pass filtering is introduced for the noise variance estimation.
The non-iterative Kalman filter is then implemented with these estimated parameters
effectively.
In order to enhance the SE performance as well as parameter estimation accuracy
in noisy conditions, an iterative Kalman filter based single channel SE method is
proposed as the second approach, which also operates on a frame-by-frame basis.
For each frame, the state-space model parameters of the KF are estimated through
an iterative procedure. The Kalman filtering iteration is first applied to each noisy
speech frame, reducing the noise component to a certain degree. At the end of this
first iteration, the LPCs and other state-space model parameters are re-estimated
using the processed speech frame and the Kalman filtering is repeated for the same
processed frame. This iteration continues till the KF converges or a maximum number
of iterations is reached, giving further enhanced speech frame. The same procedure
will repeat for the following frames until the last noisy speech frame being processed.
For further improving the speech enhancement performance, a sub-band iterative
Kalman filter based SE method is also proposed as the third approach. A wavelet
filter-bank is first used to decompose the noisy speech into a number of sub-bands.
To achieve the best trade-off among the noise reduction, speech intelligibility and
computational complexity, a partial reconstruction scheme based on consecutive mean
squared error (CMSE) is proposed to synthesize the low-frequency (LF) and highfrequency (HF) sub-bands such that the iterative KF is employed only to the partially
reconstructed HF sub-band speech. Finally, the enhanced HF sub-band speech is
combined with the partially reconstructed LF sub-band speech to reconstruct the
full-band enhanced speech.
Experimental results have shown that the proposed KF based SE methods are
capable of reducing adverse environmental noises for a wide range of input SNRs,
and the overall performance of the proposed methods in terms of different evaluation
metrics is superior to some existing state-of-the art SE methods
Recommended from our members
Time-frequency analysis based on split spectrum applied to audio and ultrasonic signals
This thesis was submitted for the award of Doctor of Philosophy and was awarded by Brunel University LondonSignal processing is a large subject with applications integral to a number of technological fields such as communication, audio, Voice over IP (VoIP), pattern recognition, sonar, radar, ultrasound and medical imaging. Techniques exist for the analysis, modelling, extraction, recognition and synthesis of signals of interest. The focus of this thesis is signal processing for acoustics (both sonic and ultrasonic). In the applications examined, signals of interest are usually incomplete, distorted and/or noisy. Therefore, reconstructing the signal, noise reduction and removal of any distortion/interference are the main goals of the signal processing techniques presented. The primary aim is to study and develop an advanced time-frequency signal processing technique for acoustic applications to enhance the quality of the signals. In the first part of the thesis, a technique is presented that models and maintains the correlation between temporal and spectral parameters of audio signals. A novel Packet Loss Concealment (PLC) method is developed with applications to VoIP, audio broadcasting, and streaming. The problem of modelling the time-varying frequency spectrum in the context of PLC is addressed, and a novel solution is proposed for tracking and using the temporal motion of spectral flow to reconstruct the signal. The proposed method utilises a Time-Frequency Motion (TFM) matrix representation of the audio signal, where each frequency is tagged with a motion vector estimate that is assessed by cross-correlation of the movement of spectral energy within sub-bands across time frames. The missing packets are estimated using extrapolation or interpolation algorithms using a TFM matrix and then inverse transformed to the time-domain for reconstruction of the signal. The proposed method is compared with conventional approaches using objective Performance Evaluation of Speech Quality (PESQ), and subjective Mean Opinion Scores (MOS) in a range of packet loss from 5% to 20%. The evaluation results demonstrate that the proposed algorithm substantially improves performance by an average of 2.85% and 5.9% in terms of PESQ and MOS respectively. In the second part of the thesis, the proposed method is extended and modified to address challenges of excessive coherent noise arising from ultrasonic signals gathered during Guided Wave Testing (GWT). It is an advanced Non-destructive testing technique which is used over several branches of industry to inspect large structures for defects where the structural integrity is of concern. In such systems, signal interpretation can often be challenging due to the multi-modal and dispersive propagation of Ultrasonic Guided Waves (UGWs). The multi-modal and dispersive nature of the received signals hampers the ability to detect defects in a given structure. The Split-Spectrum Processing (SSP) method with application for such signal has been studied and reviewed quantitatively to measure the enhancement in terms of Signal-to-Noise Ratio (SNR) and spatial resolution. In this thesis, the influence of SSP filter bank parameters on these signals is studied and optimised to improve SNR and spatial resolution considerably. The proposed method is compared analytically and experimentally with conventional approaches. The proposed SSP algorithm substantially improves SNR by an average of 30dB. The conclusions reached in this thesis will contribute to the progression of the GWT technique through considerable improvement in defect detection capability.Centre for Electronic Systems Research (CESR) of Brunel University London, The National Structural Integrity Research Centre (NSIRC) and TWI Ltd
A comparison of adaptive predictors in sub-band coding
Thesis (M.S.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1987.Bibliography: leaves 82-85.by Paul Ning.M.S
- …