164 research outputs found

    Speech enhancement by perceptual adaptive wavelet de-noising

    Get PDF
    This thesis work summarizes and compares the existing wavelet de-noising methods. Most popular methods of wavelet transform, adaptive thresholding, and musical noise suppression have been analyzed theoretically and evaluated through Matlab simulation. Based on the above work, a new speech enhancement system using adaptive wavelet de-noising is proposed. Each step of the standard wavelet thresholding is improved by optimized adaptive algorithms. The Quantile based adaptive noise estimate and the posteriori SNR based threshold adjuster are compensatory to each other. The combination of them integrates the advantages of these two approaches and balances the effects of noise removal and speech preservation. In order to improve the final perceptual quality, an innovative musical noise analysis and smoothing algorithm and a Teager Energy Operator based silent segment smoothing module are also introduced into the system. The experimental results have demonstrated the capability of the proposed system in both stationary and non-stationary noise environments

    Speech Signal Enhancement through Adaptive Wavelet Thresholding

    Get PDF
    This paper demonstrates the application of the Bionic Wavelet Transform (BWT), an adaptive wavelet transform derived from a non-linear auditory model of the cochlea, to the task of speech signal enhancement. Results, measured objectively by Signal-to-Noise ratio (SNR) and Segmental SNR (SSNR) and subjectively by Mean Opinion Score (MOS), are given for additive white Gaussian noise as well as four different types of realistic noise environments. Enhancement is accomplished through the use of thresholding on the adapted BWT coefficients, and the results are compared to a variety of speech enhancement techniques, including Ephraim Malah filtering, iterative Wiener filtering, and spectral subtraction, as well as to wavelet denoising based on a perceptually scaled wavelet packet transform decomposition. Overall results indicate that SNR and SSNR improvements for the proposed approach are comparable to those of the Ephraim Malah filter, with BWT enhancement giving the best results of all methods for the noisiest (−10 db and −5 db input SNR) conditions. Subjective measurements using MOS surveys across a variety of 0 db SNR noise conditions indicate enhancement quality competitive with but still lower than results for Ephraim Malah filtering and iterative Wiener filtering, but higher than the perceptually scaled wavelet method

    Speech Enhancement with Adaptive Thresholding and Kalman Filtering

    Get PDF
    Speech enhancement has been extensively studied for many years and various speech enhance- ment methods have been developed during the past decades. One of the objectives of speech en- hancement is to provide high-quality speech communication in the presence of background noise and concurrent interference signals. In the process of speech communication, the clean speech sig- nal is inevitably corrupted by acoustic noise from the surrounding environment, transmission media, communication equipment, electrical noise, other speakers, and other sources of interference. These disturbances can significantly degrade the quality and intelligibility of the received speech signal. Therefore, it is of great interest to develop efficient speech enhancement techniques to recover the original speech from the noisy observation. In recent years, various techniques have been developed to tackle this problem, which can be classified into single channel and multi-channel enhancement approaches. Since single channel enhancement is easy to implement, it has been a significant field of research and various approaches have been developed. For example, spectral subtraction and Wiener filtering, are among the earliest single channel methods, which are based on estimation of the power spectrum of stationary noise. However, when the noise is non-stationary, or there exists music noise and ambient speech noise, the enhancement performance would degrade considerably. To overcome this disadvantage, this thesis focuses on single channel speech enhancement under adverse noise environment, especially the non-stationary noise environment. Recently, wavelet transform based methods have been widely used to reduce the undesired background noise. On the other hand, the Kalman filter (KF) methods offer competitive denoising results, especially in non-stationary environment. It has been used as a popular and powerful tool for speech enhancement during the past decades. In this regard, a single channel wavelet thresholding based Kalman filter (KF) algorithm is proposed for speech enhancement in this thesis. The wavelet packet (WP) transform is first applied to the noise corrupted speech on a frame-by-frame basis, which decomposes each frame into a number of subbands. A voice activity detector (VAD) is then designed to detect the voiced/unvoiced frames of the subband speech. Based on the VAD result, an adaptive thresholding scheme is applied to each subband speech followed by the WP based reconstruction to obtain the pre-enhanced speech. To achieve a further level of enhancement, an iterative Kalman filter (IKF) is used to process the pre-enhanced speech. The proposed adaptive thresholding iterative Kalman filtering (AT-IKF) method is evaluated and compared with some existing methods under various noise conditions in terms of segmental SNR and perceptual evaluation of speech quality (PESQ) as two well-known performance indexes. Firstly, we compare the proposed adaptive thresholding (AT) scheme with three other threshold- ing schemes: the non-linear universal thresholding (U-T), the non-linear wavelet packet transform thresholding (WPT-T) and the non-linear SURE thresholding (SURE-T). The experimental results show that the proposed AT scheme can significantly improve the segmental SNR and PESQ for all input SNRs compared with the other existing thresholding schemes. Secondly, extensive computer simulations are conducted to evaluate the proposed AT-IKF as opposed to the AT and the IKF as standalone speech enhancement methods. It is shown that the AT-IKF method still performs the best. Lastly, the proposed ATIKF method is compared with three representative and popular meth- ods: the improved spectral subtraction based speech enhancement algorithm (ISS), the improved Wiener filter based method (IWF) and the representative subband Kalman filter based algorithm (SIKF). Experimental results demonstrate the effectiveness of the proposed method as compared to some previous works both in terms of segmental SNR and PESQ

    Audio encoding based on the empirical mode decomposition

    No full text
    National audienceThis paper deals with a new approach for perceptual audio encoding, based on the Empirical Mode Decomposition (EMD). The audio signal is decomposed adaptively into intrinsic oscillatory components by EMD called Intrinsic Mode Functions (IMFs), which can be fully described by their extrema. These extrema are encoded after an appropriate thresholding scheme controlled by a psycho-acoustic model. The decoder recovers the original signal after IMFs reconstruction by means of spline interpolation and their summation. The proposed approach is applied to different audio signals and results are compared to wavelets and to MPEG1-layer3 (MP3)approaches. Relying on exhaustive simulations, the obtained results show that the proposed compression scheme performs better than the MP3 and the wavelet approach in terms of bit rate and audio quality

    State of the art in 2D content representation and compression

    Get PDF
    Livrable D1.3 du projet ANR PERSEECe rapport a été réalisé dans le cadre du projet ANR PERSEE (n° ANR-09-BLAN-0170). Exactement il correspond au livrable D3.1 du projet

    Perceptual Video Quality Assessment and Enhancement

    Get PDF
    With the rapid development of network visual communication technologies, digital video has become ubiquitous and indispensable in our everyday lives. Video acquisition, communication, and processing systems introduce various types of distortions, which may have major impact on perceived video quality by human observers. Effective and efficient objective video quality assessment (VQA) methods that can predict perceptual video quality are highly desirable in modern visual communication systems for performance evaluation, quality control and resource allocation purposes. Moreover, perceptual VQA measures may also be employed to optimize a wide variety of video processing algorithms and systems for best perceptual quality. This thesis exploits several novel ideas in the areas of video quality assessment and enhancement. Firstly, by considering a video signal as a 3D volume image, we propose a 3D structural similarity (SSIM) based full-reference (FR) VQA approach, which also incorporates local information content and local distortion-based pooling methods. Secondly, a reduced-reference (RR) VQA scheme is developed by tracing the evolvement of local phase structures over time in the complex wavelet domain. Furthermore, we propose a quality-aware video system which combines spatial and temporal quality measures with a robust video watermarking technique, such that RR-VQA can be performed without transmitting RR features via an ancillary lossless channel. Finally, a novel strategy for enhancing video denoising algorithms, namely poly-view fusion, is developed by examining a video sequence as a 3D volume image from multiple (front, side, top) views. This leads to significant and consistent gain in terms of both peak signal-to-noise ratio (PSNR) and SSIM performance, especially at high noise levels

    Localization of Active Brain Sources From EEG Signals Using Empirical Mode Decomposition: A Comparative Study

    Get PDF
    The localization of active brain sources from Electroencephalogram (EEG) is a useful method in clinical applications, such as the study of localized epilepsy, evoked-related-potentials, and attention deficit/hyperactivity disorder. The distributed-source model is a common method to estimate neural activity in the brain. The location and amplitude of each active source are estimated by solving the inverse problem by regularization or using Bayesian methods with spatio-temporal constraints. Frequency and spatio-temporal constraints improve the quality of the reconstructed neural activity. However, separation into frequency bands is beneficial when the relevant information is in specific sub-bands. We improved frequency-band identification and preserved good temporal resolution using EEG pre-processing techniques with good frequency band separation and temporal resolution properties. The identified frequency bands were included as constraints in the solution of the inverse problem by decomposing the EEG signals into frequency bands through various methods that offer good frequency and temporal resolution, such as empirical mode decomposition (EMD) and wavelet transform (WT). We present a comparative analysis of the accuracy of brain-source reconstruction using these techniques. The accuracy of the spatial reconstruction was assessed using the Wasserstein metric for real and simulated signals. We approached the mode-mixing problem, inherent to EMD, by exploring three variants of EMD: masking EMD, Ensemble-EMD (EEMD), and multivariate EMD (MEMD). The results of the spatio-temporal brain source reconstruction using these techniques show that masking EMD and MEMD can largely mitigate the mode-mixing problem and achieve a good spatio-temporal reconstruction of the active sources. Masking EMD and EEMD achieved better reconstruction than standard EMD, Multiple Sparse Priors, or wavelet packet decomposition when EMD was used as a pre-processing tool for the spatial reconstruction (averaged over time) of the brain sources. The spatial resolution obtained using all three EMD variants was substantially better than the use of EMD alone, as the mode-mixing problem was mitigated, particularly with masking EMD and EEMD. These findings encourage further exploration into the use of EMD-based pre-processing, the mode-mixing problem, and its impact on the accuracy of brain source activity reconstruction

    No-reference image and video quality assessment: a classification and review of recent approaches

    Get PDF
    • …
    corecore