Search CORE

164 research outputs found

Speech enhancement by perceptual adaptive wavelet de-noising

Author: Xu Lan
Publication venue: 'University of Windsor Leddy Library'
Publication date: 01/01/2007
Field of study

This thesis work summarizes and compares the existing wavelet de-noising methods. Most popular methods of wavelet transform, adaptive thresholding, and musical noise suppression have been analyzed theoretically and evaluated through Matlab simulation. Based on the above work, a new speech enhancement system using adaptive wavelet de-noising is proposed. Each step of the standard wavelet thresholding is improved by optimized adaptive algorithms. The Quantile based adaptive noise estimate and the posteriori SNR based threshold adjuster are compensatory to each other. The combination of them integrates the advantages of these two approaches and balances the effects of noise removal and speech preservation. In order to improve the final perceptual quality, an innovative musical noise analysis and smoothing algorithm and a Teager Energy Operator based silent segment smoothing module are also introduced into the system. The experimental results have demonstrated the capability of the proposed system in both stationary and non-stationary noise environments

Scholarship at UWindsor

Speech Signal Enhancement through Adaptive Wavelet Thresholding

Author: Johnson Michael T
Ren Yao
Yuan Xiaolong
Publication venue: e-Publications@Marquette
Publication date: 01/02/2007
Field of study

This paper demonstrates the application of the Bionic Wavelet Transform (BWT), an adaptive wavelet transform derived from a non-linear auditory model of the cochlea, to the task of speech signal enhancement. Results, measured objectively by Signal-to-Noise ratio (SNR) and Segmental SNR (SSNR) and subjectively by Mean Opinion Score (MOS), are given for additive white Gaussian noise as well as four different types of realistic noise environments. Enhancement is accomplished through the use of thresholding on the adapted BWT coefficients, and the results are compared to a variety of speech enhancement techniques, including Ephraim Malah filtering, iterative Wiener filtering, and spectral subtraction, as well as to wavelet denoising based on a perceptually scaled wavelet packet transform decomposition. Overall results indicate that SNR and SSNR improvements for the proposed approach are comparable to those of the Ephraim Malah filter, with BWT enhancement giving the best results of all methods for the noisiest (−10 db and −5 db input SNR) conditions. Subjective measurements using MOS surveys across a variety of 0 db SNR noise conditions indicate enhancement quality competitive with but still lower than results for Ephraim Malah filtering and iterative Wiener filtering, but higher than the perceptually scaled wavelet method

epublications@Marquette

Speech Enhancement with Adaptive Thresholding and Kalman Filtering

Author: Zhao Mengjiao
Publication venue
Publication date: 01/09/2017
Field of study

Speech enhancement has been extensively studied for many years and various speech enhance- ment methods have been developed during the past decades. One of the objectives of speech en- hancement is to provide high-quality speech communication in the presence of background noise and concurrent interference signals. In the process of speech communication, the clean speech sig- nal is inevitably corrupted by acoustic noise from the surrounding environment, transmission media, communication equipment, electrical noise, other speakers, and other sources of interference. These disturbances can significantly degrade the quality and intelligibility of the received speech signal. Therefore, it is of great interest to develop efficient speech enhancement techniques to recover the original speech from the noisy observation. In recent years, various techniques have been developed to tackle this problem, which can be classified into single channel and multi-channel enhancement approaches. Since single channel enhancement is easy to implement, it has been a significant field of research and various approaches have been developed. For example, spectral subtraction and Wiener filtering, are among the earliest single channel methods, which are based on estimation of the power spectrum of stationary noise. However, when the noise is non-stationary, or there exists music noise and ambient speech noise, the enhancement performance would degrade considerably. To overcome this disadvantage, this thesis focuses on single channel speech enhancement under adverse noise environment, especially the non-stationary noise environment. Recently, wavelet transform based methods have been widely used to reduce the undesired background noise. On the other hand, the Kalman filter (KF) methods offer competitive denoising results, especially in non-stationary environment. It has been used as a popular and powerful tool for speech enhancement during the past decades. In this regard, a single channel wavelet thresholding based Kalman filter (KF) algorithm is proposed for speech enhancement in this thesis. The wavelet packet (WP) transform is first applied to the noise corrupted speech on a frame-by-frame basis, which decomposes each frame into a number of subbands. A voice activity detector (VAD) is then designed to detect the voiced/unvoiced frames of the subband speech. Based on the VAD result, an adaptive thresholding scheme is applied to each subband speech followed by the WP based reconstruction to obtain the pre-enhanced speech. To achieve a further level of enhancement, an iterative Kalman filter (IKF) is used to process the pre-enhanced speech. The proposed adaptive thresholding iterative Kalman filtering (AT-IKF) method is evaluated and compared with some existing methods under various noise conditions in terms of segmental SNR and perceptual evaluation of speech quality (PESQ) as two well-known performance indexes. Firstly, we compare the proposed adaptive thresholding (AT) scheme with three other threshold- ing schemes: the non-linear universal thresholding (U-T), the non-linear wavelet packet transform thresholding (WPT-T) and the non-linear SURE thresholding (SURE-T). The experimental results show that the proposed AT scheme can significantly improve the segmental SNR and PESQ for all input SNRs compared with the other existing thresholding schemes. Secondly, extensive computer simulations are conducted to evaluate the proposed AT-IKF as opposed to the AT and the IKF as standalone speech enhancement methods. It is shown that the AT-IKF method still performs the best. Lastly, the proposed ATIKF method is compared with three representative and popular meth- ods: the improved spectral subtraction based speech enhancement algorithm (ISS), the improved Wiener filter based method (IWF) and the representative subband Kalman filter based algorithm (SIKF). Experimental results demonstrate the effectiveness of the proposed method as compared to some previous works both in terms of segmental SNR and PESQ

Concordia University Research Repository

De-Noising Audio Signals Using MATLAB Wavelets Toolbox

Author: Aaron Flores-Gil
Adrian E. Villanueva- Luna
Alberto Jaramillo-Nuñez
Carlos M. Ortiz-Lima
Daniel Sanchez-Lucero
J. Gabriel Aguilar-Soto
Manuel May-Alarcon
Publication venue: 'IntechOpen'
Publication date: 10/10/2011
Field of study

IntechOpen

Audio encoding based on the empirical mode decomposition

Author: Boudraa Abdel-Ouahab
Chonavel Thierry
Khaldi Kais
Samaali Imen
Turki Monia
Publication venue: HAL CCSD
Publication date: 24/08/2009
Field of study

National audienceThis paper deals with a new approach for perceptual audio encoding, based on the Empirical Mode Decomposition (EMD). The audio signal is decomposed adaptively into intrinsic oscillatory components by EMD called Intrinsic Mode Functions (IMFs), which can be fully described by their extrema. These extrema are encoded after an appropriate thresholding scheme controlled by a psycho-acoustic model. The decoder recovers the original signal after IMFs reconstruction by means of spline interpolation and their summation. The proposed approach is applied to different audio signals and results are compared to wavelets and to MPEG1-layer3 (MP3)approaches. Relying on exhaustive simulations, the obtained results show that the proposed compression scheme performs better than the MP3 and the wavelet approach in terms of bit rate and audio quality

HAL-Université de Bretagne Occidentale

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

State of the art in 2D content representation and compression

Author: Cagnazzo Marco
D'Acunto Erica
Drémeau Angélique
Guillemot Christine
Pesquet-Popescu Beatrice
Ricordel Vincent
Publication venue: HAL CCSD
Publication date: 01/09/2010
Field of study

Livrable D1.3 du projet ANR PERSEECe rapport a été réalisé dans le cadre du projet ANR PERSEE (n° ANR-09-BLAN-0170). Exactement il correspond au livrable D3.1 du projet

HAL-CentraleSupelec

INRIA a CCSD electronic archive server

HAL-Rennes 1

Perceptual Video Quality Assessment and Enhancement

Author: Zeng Kai
Publication venue: 'University of Waterloo'
Publication date: 12/08/2013
Field of study

With the rapid development of network visual communication technologies, digital video has become ubiquitous and indispensable in our everyday lives. Video acquisition, communication, and processing systems introduce various types of distortions, which may have major impact on perceived video quality by human observers. Effective and efficient objective video quality assessment (VQA) methods that can predict perceptual video quality are highly desirable in modern visual communication systems for performance evaluation, quality control and resource allocation purposes. Moreover, perceptual VQA measures may also be employed to optimize a wide variety of video processing algorithms and systems for best perceptual quality. This thesis exploits several novel ideas in the areas of video quality assessment and enhancement. Firstly, by considering a video signal as a 3D volume image, we propose a 3D structural similarity (SSIM) based full-reference (FR) VQA approach, which also incorporates local information content and local distortion-based pooling methods. Secondly, a reduced-reference (RR) VQA scheme is developed by tracing the evolvement of local phase structures over time in the complex wavelet domain. Furthermore, we propose a quality-aware video system which combines spatial and temporal quality measures with a robust video watermarking technique, such that RR-VQA can be performed without transmitting RR features via an ancillary lossless channel. Finally, a novel strategy for enhancing video denoising algorithms, namely poly-view fusion, is developed by examining a video sequence as a 3D volume image from multiple (front, side, top) views. This leads to significant and consistent gain in terms of both peak signal-to-noise ratio (PSNR) and SSIM performance, especially at high noise levels

University of Waterloo's Institutional Repository

Localization of Active Brain Sources From EEG Signals Using Empirical Mode Decomposition: A Comparative Study

Author: Eduardo Giraldo
Marta Molinas
Maximiliano Bueno-López
Pablo Andrés Muñoz-Gutiérrez
Pablo Andrés Muñoz-Gutiérrez
Publication venue: 'Frontiers Media SA'
Publication date: 01/01/2018
Field of study

The localization of active brain sources from Electroencephalogram (EEG) is a useful method in clinical applications, such as the study of localized epilepsy, evoked-related-potentials, and attention deficit/hyperactivity disorder. The distributed-source model is a common method to estimate neural activity in the brain. The location and amplitude of each active source are estimated by solving the inverse problem by regularization or using Bayesian methods with spatio-temporal constraints. Frequency and spatio-temporal constraints improve the quality of the reconstructed neural activity. However, separation into frequency bands is beneficial when the relevant information is in specific sub-bands. We improved frequency-band identification and preserved good temporal resolution using EEG pre-processing techniques with good frequency band separation and temporal resolution properties. The identified frequency bands were included as constraints in the solution of the inverse problem by decomposing the EEG signals into frequency bands through various methods that offer good frequency and temporal resolution, such as empirical mode decomposition (EMD) and wavelet transform (WT). We present a comparative analysis of the accuracy of brain-source reconstruction using these techniques. The accuracy of the spatial reconstruction was assessed using the Wasserstein metric for real and simulated signals. We approached the mode-mixing problem, inherent to EMD, by exploring three variants of EMD: masking EMD, Ensemble-EMD (EEMD), and multivariate EMD (MEMD). The results of the spatio-temporal brain source reconstruction using these techniques show that masking EMD and MEMD can largely mitigate the mode-mixing problem and achieve a good spatio-temporal reconstruction of the active sources. Masking EMD and EEMD achieved better reconstruction than standard EMD, Multiple Sparse Priors, or wavelet packet decomposition when EMD was used as a pre-processing tool for the spatial reconstruction (averaged over time) of the brain sources. The spatial resolution obtained using all three EMD variants was substantially better than the use of EMD alone, as the mode-mixing problem was mitigated, particularly with masking EMD and EEMD. These findings encourage further exploration into the use of EMD-based pre-processing, the mode-mixing problem, and its impact on the accuracy of brain source activity reconstruction

Directory of Open Access Journals

NORA - Norwegian Open Research Archives

Ciencia Unisalle (Universidad de La Salle, Bogota)

No-reference image and video quality assessment: a classification and review of recent approaches

Author: A Amer
A Amer
A Chetouani
A Chetouani
A Ciancio
A Ciancio
A Eden
A Ichigaya
A Ichigaya
A Khan
A Khan
A Khan
A Maalouf
A Maalouf
A Mittal
A Mittal
A Raake
A Rossholm
A Rossholm
A Takahashi
AB Watson
AC Bovik
AG Davis
AK Moorthy
AM Treisman
AN Rimell
Andreas Rossholm
AR Reibman
AR Reibman
B Belmudez
B Lee
B-X Zuo
B-X Zuo
Benny Lövström
C Chen
C Chen
C Keimel
C Keimel
C Keimel
C Li
C Oprea
C-S Park
Cisco Visual Networking Index
D Bhattacharjee
D Ćulibrk
DL Ruderman
DM Chandler
E Cohen
F Battisti
F Yang
F Yang
F Yang
G Valenzise
G Valenzise
G Van Wallendael
G Yammine
G Zhai
H Boujut
H Liu
H Liu
H Liu
H Liu
H Liu
H Tong
Hans-Jürgen Zepernick
HR Sheikh
HR Sheikh
HR Wu
I Park
I Sedano
ITU
ITU
ITU-T
J Han
J Joskowicz
J Park
J Shen
J Tian
J You
J You
J Zhang
J Zhang
J Zhang
J Zhou
JE Caviedes
K Nishikawa
K Nishikawa
K Rank
K Watanabe
K Watanabe
K Yamagishi
K Zhu
K-C Yang
KD Singh
L Debing
L Liang
M Barkowsky
M Chin
M Ghazal
M Naccari
M Naccari
M Narwaria
M Ries
M Ries
M Ries
M Shahid
M Shahid
M Slanina
M Vranješ
M-J Chen
M-J Chen
M-N Garcia
MA Saad
MA Saad
MA Saad
MCQ Farias
MG Choi
MN Do
Muhammad Shahid
N Narvekar
N Narvekar
N Ponomarenko
N Staelens
N Staelens
ND Narvekar
NG Sadaka
O Sugimoto
OYG Castillo
P Gastaldo
P Kortum
P Marziliano
P Marziliano
P Romaniak
PL Callet
Q Huynh-Thu
Q Huynh-Thu
R Ferzli
R Ferzli
R Ferzli
R Ferzli
R Ferzli
R Hassen
R Soundararajan
RR Pastrana-Vidal
RR Pastrana-Vidal
RV Babu
RV Babu
S Argyropoulos
S Borer
S Chikkerur
S Gabarda
S Ouni
S Pyatykh
S Suresh
S Suthaharan
S Varadarajan
S Winkler
S Winkler
S Wolf
S Wu
S Wu
S Yao
S Zhao
S-O Lee
S-Y Shim
SI Olsen
SS Hemami
T Brandão
T Brandão
T Brandão
T Brandão
T Brandão
T Oelbaum
T Shanableh
T Shanableh
T Yamada
T Yamada
T Yamada
U Engelke
U Engelke
U Engelke
VQEG
VQEG
W Lin
W Lu
X Jiang
X Liu
X Liu
X Liu
X Marichal
X Zhu
X Zhu
X-H Wang
Z Hua
Z Hua
Z Wang
Z Wang
Z Wang
Z Zhang
ZMP Sazzad
ZMP Sazzad
ZMP Sazzad
ZMP Sazzad
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref