38 research outputs found

    Identification of Transient Speech Using Wavelet Transforms

    Get PDF
    It is generally believed that abrupt stimulus changes, which in speech may be time-varying frequency edges associated with consonants, transitions between consonants and vowels and transitions within vowels are critical to the perception of speech by humans and for speech recognition by machines. Noise affects speech transitions more than it affects quasi-steady-state speech. I believe that identifying and selectively amplifying speech transitions may enhance the intelligibility of speech in noisy conditions. The purpose of this study is to evaluate the use of wavelet transforms to identify speech transitions. Using wavelet transforms may be computationally efficient and allow for real-time applications. The discrete wavelet transform (DWT), stationary wavelet transform (SWT) and wavelet packets (WP) are evaluated. Wavelet analysis is combined with variable frame rate processing to improve the identification process. Variable frame rate can identify time segments when speech feature vectors are changing rapidly and when they are relatively stationary. Energy profiles for words, which show the energy in each node of a speech signal decomposed using wavelets, are used to identify nodes that include predominately transient information and nodes that include predominately quasi-steady-state information, and these are used to synthesize transient and quasi-steady-state speech components. These speech components are estimates of the tonal and nontonal speech components, which Yoo et al identified using time-varying band-pass filters. Comparison of spectra, a listening test and mean-squared-errors between the transient components synthesized using wavelets and Yoo's nontonal components indicated that wavelet packets identified the best estimates of Yoo's components. An algorithm that incorporates variable frame rate analysis into wavelet packet analysis is proposed. The development of this algorithm involves the processes of choosing a wavelet function and a decomposition level to be used. The algorithm itself has 4 steps: wavelet packet decomposition; classification of terminal nodes; incorporation of variable frame rate processing; synthesis of speech components. Combining wavelet analysis with variable frame rate analysis provides the best estimates of Yoo's speech components

    Wavelet methods in speech recognition

    Get PDF
    In this thesis, novel wavelet techniques are developed to improve parametrization of speech signals prior to classification. It is shown that non-linear operations carried out in the wavelet domain improve the performance of a speech classifier and consistently outperform classical Fourier methods. This is because of the localised nature of the wavelet, which captures correspondingly well-localised time-frequency features within the speech signal. Furthermore, by taking advantage of the approximation ability of wavelets, efficient representation of the non-stationarity inherent in speech can be achieved in a relatively small number of expansion coefficients. This is an attractive option when faced with the so-called 'Curse of Dimensionality' problem of multivariate classifiers such as Linear Discriminant Analysis (LDA) or Artificial Neural Networks (ANNs). Conventional time-frequency analysis methods such as the Discrete Fourier Transform either miss irregular signal structures and transients due to spectral smearing or require a large number of coefficients to represent such characteristics efficiently. Wavelet theory offers an alternative insight in the representation of these types of signals. As an extension to the standard wavelet transform, adaptive libraries of wavelet and cosine packets are introduced which increase the flexibility of the transform. This approach is observed to be yet more suitable for the highly variable nature of speech signals in that it results in a time-frequency sampled grid that is well adapted to irregularities and transients. They result in a corresponding reduction in the misclassification rate of the recognition system. However, this is necessarily at the expense of added computing time. Finally, a framework based on adaptive time-frequency libraries is developed which invokes the final classifier to choose the nature of the resolution for a given classification problem. The classifier then performs dimensionaIity reduction on the transformed signal by choosing the top few features based on their discriminant power. This approach is compared and contrasted to an existing discriminant wavelet feature extractor. The overall conclusions of the thesis are that wavelets and their relatives are capable of extracting useful features for speech classification problems. The use of adaptive wavelet transforms provides the flexibility within which powerful feature extractors can be designed for these types of application

    Directional edge and texture representations for image processing

    Get PDF
    An efficient representation for natural images is of fundamental importance in image processing and analysis. The commonly used separable transforms such as wavelets axe not best suited for images due to their inability to exploit directional regularities such as edges and oriented textural patterns; while most of the recently proposed directional schemes cannot represent these two types of features in a unified transform. This thesis focuses on the development of directional representations for images which can capture both edges and textures in a multiresolution manner. The thesis first considers the problem of extracting linear features with the multiresolution Fourier transform (MFT). Based on a previous MFT-based linear feature model, the work extends the extraction method into the situation when the image is corrupted by noise. The problem is tackled by the combination of a "Signal+Noise" frequency model, a refinement stage and a robust classification scheme. As a result, the MFT is able to perform linear feature analysis on noisy images on which previous methods failed. A new set of transforms called the multiscale polar cosine transforms (MPCT) are also proposed in order to represent textures. The MPCT can be regarded as real-valued MFT with similar basis functions of oriented sinusoids. It is shown that the transform can represent textural patches more efficiently than the conventional Fourier basis. With a directional best cosine basis, the MPCT packet (MPCPT) is shown to be an efficient representation for edges and textures, despite its high computational burden. The problem of representing edges and textures in a fixed transform with less complexity is then considered. This is achieved by applying a Gaussian frequency filter, which matches the disperson of the magnitude spectrum, on the local MFT coefficients. This is particularly effective in denoising natural images, due to its ability to preserve both types of feature. Further improvements can be made by employing the information given by the linear feature extraction process in the filter's configuration. The denoising results compare favourably against other state-of-the-art directional representations

    Discrete Wavelet Transforms

    Get PDF
    The discrete wavelet transform (DWT) algorithms have a firm position in processing of signals in several areas of research and industry. As DWT provides both octave-scale frequency and spatial timing of the analyzed signal, it is constantly used to solve and treat more and more advanced problems. The present book: Discrete Wavelet Transforms: Algorithms and Applications reviews the recent progress in discrete wavelet transform algorithms and applications. The book covers a wide range of methods (e.g. lifting, shift invariance, multi-scale analysis) for constructing DWTs. The book chapters are organized into four major parts. Part I describes the progress in hardware implementations of the DWT algorithms. Applications include multitone modulation for ADSL and equalization techniques, a scalable architecture for FPGA-implementation, lifting based algorithm for VLSI implementation, comparison between DWT and FFT based OFDM and modified SPIHT codec. Part II addresses image processing algorithms such as multiresolution approach for edge detection, low bit rate image compression, low complexity implementation of CQF wavelets and compression of multi-component images. Part III focuses watermaking DWT algorithms. Finally, Part IV describes shift invariant DWTs, DC lossless property, DWT based analysis and estimation of colored noise and an application of the wavelet Galerkin method. The chapters of the present book consist of both tutorial and highly advanced material. Therefore, the book is intended to be a reference text for graduate students and researchers to obtain state-of-the-art knowledge on specific applications

    Digital image compression

    Get PDF

    A family of stereoscopic image compression algorithms using wavelet transforms

    Get PDF
    With the standardization of JPEG-2000, wavelet-based image and video compression technologies are gradually replacing the popular DCT-based methods. In parallel to this, recent developments in autostereoscopic display technology is now threatening to revolutionize the way in which consumers are used to enjoying the traditional 2-D display based electronic media such as television, computer and movies. However, due to the two-fold bandwidth/storage space requirement of stereoscopic imaging, an essential requirement of a stereo imaging system is efficient data compression. In this thesis, seven wavelet-based stereo image compression algorithms are proposed, to take advantage of the higher data compaction capability and better flexibility of wavelets. [Continues.

    A family of stereoscopic image compression algorithms using wavelet transforms

    Get PDF
    With the standardization of JPEG-2000, wavelet-based image and video compression technologies are gradually replacing the popular DCT-based methods. In parallel to this, recent developments in autostereoscopic display technology is now threatening to revolutionize the way in which consumers are used to enjoying the traditional 2D display based electronic media such as television, computer and movies. However, due to the two-fold bandwidth/storage space requirement of stereoscopic imaging, an essential requirement of a stereo imaging system is efficient data compression. In this thesis, seven wavelet-based stereo image compression algorithms are proposed, to take advantage of the higher data compaction capability and better flexibility of wavelets. In the proposed CODEC I, block-based disparity estimation/compensation (DE/DC) is performed in pixel domain. However, this results in an inefficiency when DWT is applied on the whole predictive error image that results from the DE process. This is because of the existence of artificial block boundaries between error blocks in the predictive error image. To overcome this problem, in the remaining proposed CODECs, DE/DC is performed in the wavelet domain. Due to the multiresolution nature of the wavelet domain, two methods of disparity estimation and compensation have been proposed. The first method is performing DEJDC in each subband of the lowest/coarsest resolution level and then propagating the disparity vectors obtained to the corresponding subbands of higher/finer resolution. Note that DE is not performed in every subband due to the high overhead bits that could be required for the coding of disparity vectors of all subbands. This method is being used in CODEC II. In the second method, DEJDC is performed m the wavelet-block domain. This enables disparity estimation to be performed m all subbands simultaneously without increasing the overhead bits required for the coding disparity vectors. This method is used by CODEC III. However, performing disparity estimation/compensation in all subbands would result in a significant improvement of CODEC III. To further improve the performance of CODEC ill, pioneering wavelet-block search technique is implemented in CODEC IV. The pioneering wavelet-block search technique enables the right/predicted image to be reconstructed at the decoder end without the need of transmitting the disparity vectors. In proposed CODEC V, pioneering block search is performed in all subbands of DWT decomposition which results in an improvement of its performance. Further, the CODEC IV and V are able to perform at very low bit rates(< 0.15 bpp). In CODEC VI and CODEC VII, Overlapped Block Disparity Compensation (OBDC) is used with & without the need of coding disparity vector. Our experiment results showed that no significant coding gains could be obtained for these CODECs over CODEC IV & V. All proposed CODECs m this thesis are wavelet-based stereo image coding algorithms that maximise the flexibility and benefits offered by wavelet transform technology when applied to stereo imaging. In addition the use of a baseline-JPEG coding architecture would enable the easy adaptation of the proposed algorithms within systems originally built for DCT-based coding. This is an important feature that would be useful during an era where DCT-based technology is only slowly being phased out to give way for DWT based compression technology. In addition, this thesis proposed a stereo image coding algorithm that uses JPEG-2000 technology as the basic compression engine. The proposed CODEC, named RASTER is a rate scalable stereo image CODEC that has a unique ability to preserve the image quality at binocular depth boundaries, which is an important requirement in the design of stereo image CODEC. The experimental results have shown that the proposed CODEC is able to achieve PSNR gains of up to 3.7 dB as compared to directly transmitting the right frame using JPEG-2000
    corecore