783 research outputs found
Low-delay nonuniform pseudo-QMF banks with application to speech enhancement
Journal ArticleAbstract-This paper presents a method for designing low-delay nonuniform pseudo quadrature mirror filter (QMF) banks. This method is motivated by the work of Li, Nguyen, and Tantaratana, in which the nonuniform filter bank is realized by combining an appropriate number of adjacent sub-bands of a uniform pseudo-QMF bank. In prior work, the prototype filter of the uniform pseudo-QMF bank was constrained to have linear phase and the overall delay associated with the filter bank was often unacceptably large for filter banks with a large number of sub-bands. This paper proposes a pseudo-QMF filter bank design technique that significantly reduces the delay by relaxing the linear phase constraints. An example in which an oversampled critical-band nonuniform filter bank is designed and applied to a two-state modeling speech enhancement system is presented in this paper. Comparison of the performance of this system to competing methods employing tree-structured, linear phase multiresolution analysis indicates that the approach described in this paper strikes a good balance between system performance and low delay
Scalable Speech Coding for IP Networks
The emergence of Voice over Internet Protocol (VoIP) has posed new challenges to the development of speech codecs. The key issue of transporting real-time voice packet over IP networks is the lack of guarantee for reasonable speech quality due to packet delay or loss.
Most of the widely used narrowband codecs depend on the Code Excited Linear Prediction (CELP) coding technique. The CELP technique utilizes the long-term prediction across the frame boundaries and therefore causes error propagation in the case of packet loss and need to transmit redundant information in order to mitigate the problem. The internet Low Bit-rate Codec (iLBC) employs the frame-independent coding and therefore inherently possesses high robustness to packet loss. However, the original iLBC lacks in some of the key features of speech codecs for IP networks: Rate flexibility, Scalability, and Wideband support.
This dissertation presents novel scalable narrowband and wideband speech codecs for IP networks using the frame independent coding scheme based on the iLBC. The rate flexibility is added to the iLBC by employing the discrete cosine transform (DCT) and iii the scalable algebraic vector quantization (AVQ) and by allocating different number of bits to the AVQ. The bit-rate scalability is obtained by adding the enhancement layer to the core layer of the multi-rate iLBC. The enhancement layer encodes the weighted iLBC coding error in the modified DCT (MDCT) domain. The proposed wideband codec employs the bandwidth extension technique to extend the capabilities of existing narrowband codecs to provide wideband coding functionality. The wavelet transform is also used to further enhance the performance of the proposed codec.
The performance evaluation results show that the proposed codec provides high robustness to packet loss and achieves equivalent or higher speech quality than state-of-the-art codecs under the clean channel condition
A target guided subband filter for acoustic event detection in noisy environments using wavelet packets
This paper deals with acoustic event detection (AED), such as screams, gunshots, and explosions, in noisy environments. The main aim is to improve the detection performance under adverse conditions with a very low signal-to-noise ratio (SNR). A novel filtering method combined with an energy detector is presented. The wavelet packet transform (WPT) is first used for time-frequency representation of the acoustic signals. The proposed filter in the wavelet packet domain then uses a priori knowledge of the target event and an estimate of noise features to selectively suppress the background noise. It is in fact a content-aware band-pass filter which can automatically pass the frequency bands that are more significant in the target than in the noise. Theoretical analysis shows that the proposed filtering method is capable of enhancing the target content while suppressing the background noise for signals with a low SNR. A condition to increase the probability of correct detection is also obtained. Experiments have been carried out on a large dataset of acoustic events that are contaminated by different types of environmental noise and white noise with varying SNRs. Results show that the proposed method is more robust and better adapted to noise than ordinary energy detectors, and it can work even with an SNR as low as -15 dB. A practical system for real time processing and multi-target detection is also proposed in this work
Adaptive Variable Degree-k Zero-Trees for Re-Encoding of Perceptually Quantized Wavelet-Packet Transformed Audio and High Quality Speech
A fast, efficient and scalable algorithm is proposed, in this paper, for
re-encoding of perceptually quantized wavelet-packet transform (WPT)
coefficients of audio and high quality speech and is called "adaptive variable
degree-k zero-trees" (AVDZ). The quantization process is carried out by taking
into account some basic perceptual considerations, and achieves good subjective
quality with low complexity. The performance of the proposed AVDZ algorithm is
compared with two other zero-tree-based schemes comprising: 1- Embedded
Zero-tree Wavelet (EZW) and 2- The set partitioning in hierarchical trees
(SPIHT). Since EZW and SPIHT are designed for image compression, some
modifications are incorporated in these schemes for their better matching to
audio signals. It is shown that the proposed modifications can improve their
performance by about 15-25%. Furthermore, it is concluded that the proposed
AVDZ algorithm outperforms these modified versions in terms of both output
average bit-rates and computation times.Comment: 30 pages (Double space), 15 figures, 5 tables, ISRN Signal Processing
(in Press
Offline and real time noise reduction in speech signals using the discrete wavelet packet decomposition
This thesis describes the development of an offline and real time wavelet based speech enhancement system to process speech corrupted with various amounts of white Gaussian noise and other different noise types
Speech enhancement by perceptual adaptive wavelet de-noising
This thesis work summarizes and compares the existing wavelet de-noising methods. Most popular methods of wavelet transform, adaptive thresholding, and musical noise suppression have been analyzed theoretically and evaluated through Matlab simulation. Based on the above work, a new speech enhancement system using adaptive wavelet de-noising is proposed. Each step of the standard wavelet thresholding is improved by optimized adaptive algorithms. The Quantile based adaptive noise estimate and the posteriori SNR based threshold adjuster are compensatory to each other. The combination of them integrates the advantages of these two approaches and balances the effects of noise removal and speech preservation. In order to improve the final perceptual quality, an innovative musical noise analysis and smoothing algorithm and a Teager Energy Operator based silent segment smoothing module are also introduced into the system. The experimental results have demonstrated the capability of the proposed system in both stationary and non-stationary noise environments
Wavelet-based techniques for speech recognition
In this thesis, new wavelet-based techniques have been developed for the
extraction of features from speech signals for the purpose of automatic speech
recognition (ASR). One of the advantages of the wavelet transform over the short
time Fourier transform (STFT) is its capability to process non-stationary signals.
Since speech signals are not strictly stationary the wavelet transform is a better
choice for time-frequency transformation of these signals. In addition it has
compactly supported basis functions, thereby reducing the amount of
computation as opposed to STFT where an overlapping window is needed. [Continues.
- …