280 research outputs found
Detection of Nonstationary Noise and Improved Voice Activity Detection in an Automotive Hands-free Environment
Speech processing in the automotive environment is a challenging problem due to the presence of powerful and unpredictable nonstationary noise. This thesis addresses two detection problems involving both nonstationary noise signals and nonstationary desired signals. Two detectors are developed: one to detect passing vehicle noise in the presence of speech and one to detect speech in the presence of passing vehicle noise. The latter is then measured against a state-of-the-art voice activity detector used in telephony. The process of compiling a library of recordings in the automobile to facilitate this research is also detailed
Speech enhancement using auditory filterbank.
This thesis presents a novel subband noise reduction technique for speech enhancement, termed as Adaptive Subband Wiener Filtering (ASWF), based on a critical-band gammatone filterbank. The ASWF is derived from a generalized Subband Wiener Filtering (SWF) equation and reduces noises according to the estimated signal-to-noise ratio (SNR) in each auditory channel and in each time frame. The design of a subband noise estimator, suitable for some real-life noise environments, is also presented. This denoising technique would be beneficial for some auditory-based speech and audio applications, e.g. to enhance the robustness of sound processing in cochlear implants. Comprehensive objective and subjective tests demonstrated the proposed technique is effective to improve the perceptual quality of enhanced speeches. This technique offers a time-domain noise reduction scheme using a linear filterbank structure and can be combined with other filterbank algorithms (such as for speech recognition and coding) as a front-end processing step immediately after the analysis filterbank, to increase the robustness of the respective application.Dept. of Electrical and Computer Engineering. Paper copy at Leddy Library: Theses & Major Papers - Basement, West Bldg. / Call Number: Thesis2005 .G85. Source: Masters Abstracts International, Volume: 44-03, page: 1452. Thesis (M.A.Sc.)--University of Windsor (Canada), 2005
Studies on noise robust automatic speech recognition
Noise in everyday acoustic environments such as cars, traffic environments, and cafeterias remains one of the main challenges in automatic speech recognition (ASR). As a research theme, it has received wide attention in conferences and scientific journals focused on speech technology. This article collection reviews both the classic and novel approaches suggested for noise robust ASR. The articles are literature reviews written for the spring 2009 seminar course on noise robust automatic speech recognition (course code T-61.6060) held at TKK
Speech Enhancement Using Wavelet Coefficients Masking with Local Binary Patterns
In this paper, we present a wavelet coefficients masking
based on Local Binary Patterns (WLBP) approach to enhance the
temporal spectra of the wavelet coefficients for speech enhancement.
This technique exploits the wavelet denoising scheme, which splits
the degraded speech into pyramidal subband components and extracts
frequency information without losing temporal information. Speech
enhancement in each high-frequency subband is performed by binary
labels through the local binary pattern masking that encodes the ratio
between the original value of each coefficient and the values of the
neighbour coefficients. This approach enhances the high-frequency
spectra of the wavelet transform instead of eliminating them through
a threshold. A comparative analysis is carried out with conventional
speech enhancement algorithms, demonstrating that the proposed
technique achieves significant improvements in terms of PESQ, an
international recommendation of objective measure for estimating
subjective speech quality. Informal listening tests also show that
the proposed method in an acoustic context improves the quality
of speech, avoiding the annoying musical noise present in other
speech enhancement techniques. Experimental results obtained with a
DNN based speech recognizer in noisy environments corroborate the
superiority of the proposed scheme in the robust speech recognition
scenario
Speech enhancement by perceptual adaptive wavelet de-noising
This thesis work summarizes and compares the existing wavelet de-noising methods. Most popular methods of wavelet transform, adaptive thresholding, and musical noise suppression have been analyzed theoretically and evaluated through Matlab simulation. Based on the above work, a new speech enhancement system using adaptive wavelet de-noising is proposed. Each step of the standard wavelet thresholding is improved by optimized adaptive algorithms. The Quantile based adaptive noise estimate and the posteriori SNR based threshold adjuster are compensatory to each other. The combination of them integrates the advantages of these two approaches and balances the effects of noise removal and speech preservation. In order to improve the final perceptual quality, an innovative musical noise analysis and smoothing algorithm and a Teager Energy Operator based silent segment smoothing module are also introduced into the system. The experimental results have demonstrated the capability of the proposed system in both stationary and non-stationary noise environments
Speech spectrum non-stationarity detection based on line spectrum frequencies and related applications
Ankara : Department of Electrical and Electronics Engineering and The Institute of Engineering and Sciences of Bilkent University, 1998.Thesis (Master's) -- Bilkent University, 1998.Includes bibliographical references leaves 124-132In this thesis, two new speech variation measures for speech spectrum nonstationarity
detection are proposed. These measures are based on the Line
Spectrum Frequencies (LSF) and the spectral values at the LSF locations.
They are formulated to be subjectively meaningful, mathematically tractable,
and also have low computational complexity property. In order to demonstrate
the usefulness of the non-stationarity detector, two applications are presented:
The first application is an implicit speech segmentation system which detects
non-stationary regions in speech signal and obtains the boundaries of the speech
segments. The other application is a Variable Bit-Rate Mixed Excitation Linear
Predictive (VBR-MELP) vocoder utilizing a novel voice activity detector
to detect silent regions in the speech. This voice activity detector is designed
to be robust to non-stationary background noise and provides efficient coding
of silent sections and unvoiced utterances to decrease the bit-rate. Simulation
results are also presented.Ertan, Ali ErdemM.S
- …