74 research outputs found

    Noise robust speaker verification using mel-frequency discrete wavelet coefficients and parallel model compensation

    Get PDF
    Interfering noise severely degrades the performance of a speaker verification system. The Parallel Model Combination (PMC) technique is one of the most efficient techniques for dealing with such noise. Another method is to use features local in the frequency domain. Recently, Mel-Frequency Discrete Wavelet Coefficients (MFDWCs) [1, 2] were proposed as speech features local in frequency domain. In this paper, we discuss using PMC along with MFDWCs features to take advantage of both noise compensation and local features (MFDWCs) to decrease the effect of noise on speaker verification performance. We evaluate the performance of MFDWCs using the NIST 1998 speaker recognition and NOISEX-92 databases for various noise types and noise levels. We also compare the performance of these versus MFCCs and both using PMC for dealing with additive noise. The experimental results show significant performance improvements for MFDWCs versus MFCCs after compensating the Gaussian Mixture Models (GMMs) using the PMC technique. The MFDWCs gave 5.24 and 3.23 points performance improvement on average over MFCCs for -6 dB and 0 dB SNR values, respectively. These correspond to 26.44% and 23.73% relative reductions in equal error rate (EER), respectively

    Wavelet methods in speech recognition

    Get PDF
    In this thesis, novel wavelet techniques are developed to improve parametrization of speech signals prior to classification. It is shown that non-linear operations carried out in the wavelet domain improve the performance of a speech classifier and consistently outperform classical Fourier methods. This is because of the localised nature of the wavelet, which captures correspondingly well-localised time-frequency features within the speech signal. Furthermore, by taking advantage of the approximation ability of wavelets, efficient representation of the non-stationarity inherent in speech can be achieved in a relatively small number of expansion coefficients. This is an attractive option when faced with the so-called 'Curse of Dimensionality' problem of multivariate classifiers such as Linear Discriminant Analysis (LDA) or Artificial Neural Networks (ANNs). Conventional time-frequency analysis methods such as the Discrete Fourier Transform either miss irregular signal structures and transients due to spectral smearing or require a large number of coefficients to represent such characteristics efficiently. Wavelet theory offers an alternative insight in the representation of these types of signals. As an extension to the standard wavelet transform, adaptive libraries of wavelet and cosine packets are introduced which increase the flexibility of the transform. This approach is observed to be yet more suitable for the highly variable nature of speech signals in that it results in a time-frequency sampled grid that is well adapted to irregularities and transients. They result in a corresponding reduction in the misclassification rate of the recognition system. However, this is necessarily at the expense of added computing time. Finally, a framework based on adaptive time-frequency libraries is developed which invokes the final classifier to choose the nature of the resolution for a given classification problem. The classifier then performs dimensionaIity reduction on the transformed signal by choosing the top few features based on their discriminant power. This approach is compared and contrasted to an existing discriminant wavelet feature extractor. The overall conclusions of the thesis are that wavelets and their relatives are capable of extracting useful features for speech classification problems. The use of adaptive wavelet transforms provides the flexibility within which powerful feature extractors can be designed for these types of application

    Directional edge and texture representations for image processing

    Get PDF
    An efficient representation for natural images is of fundamental importance in image processing and analysis. The commonly used separable transforms such as wavelets axe not best suited for images due to their inability to exploit directional regularities such as edges and oriented textural patterns; while most of the recently proposed directional schemes cannot represent these two types of features in a unified transform. This thesis focuses on the development of directional representations for images which can capture both edges and textures in a multiresolution manner. The thesis first considers the problem of extracting linear features with the multiresolution Fourier transform (MFT). Based on a previous MFT-based linear feature model, the work extends the extraction method into the situation when the image is corrupted by noise. The problem is tackled by the combination of a "Signal+Noise" frequency model, a refinement stage and a robust classification scheme. As a result, the MFT is able to perform linear feature analysis on noisy images on which previous methods failed. A new set of transforms called the multiscale polar cosine transforms (MPCT) are also proposed in order to represent textures. The MPCT can be regarded as real-valued MFT with similar basis functions of oriented sinusoids. It is shown that the transform can represent textural patches more efficiently than the conventional Fourier basis. With a directional best cosine basis, the MPCT packet (MPCPT) is shown to be an efficient representation for edges and textures, despite its high computational burden. The problem of representing edges and textures in a fixed transform with less complexity is then considered. This is achieved by applying a Gaussian frequency filter, which matches the disperson of the magnitude spectrum, on the local MFT coefficients. This is particularly effective in denoising natural images, due to its ability to preserve both types of feature. Further improvements can be made by employing the information given by the linear feature extraction process in the filter's configuration. The denoising results compare favourably against other state-of-the-art directional representations

    Wavelet theory and applications:a literature study

    Get PDF

    A Panorama on Multiscale Geometric Representations, Intertwining Spatial, Directional and Frequency Selectivity

    Full text link
    The richness of natural images makes the quest for optimal representations in image processing and computer vision challenging. The latter observation has not prevented the design of image representations, which trade off between efficiency and complexity, while achieving accurate rendering of smooth regions as well as reproducing faithful contours and textures. The most recent ones, proposed in the past decade, share an hybrid heritage highlighting the multiscale and oriented nature of edges and patterns in images. This paper presents a panorama of the aforementioned literature on decompositions in multiscale, multi-orientation bases or dictionaries. They typically exhibit redundancy to improve sparsity in the transformed domain and sometimes its invariance with respect to simple geometric deformations (translation, rotation). Oriented multiscale dictionaries extend traditional wavelet processing and may offer rotation invariance. Highly redundant dictionaries require specific algorithms to simplify the search for an efficient (sparse) representation. We also discuss the extension of multiscale geometric decompositions to non-Euclidean domains such as the sphere or arbitrary meshed surfaces. The etymology of panorama suggests an overview, based on a choice of partially overlapping "pictures". We hope that this paper will contribute to the appreciation and apprehension of a stream of current research directions in image understanding.Comment: 65 pages, 33 figures, 303 reference

    Contextual biometric watermarking of fingerprint images

    Get PDF
    This research presents contextual digital watermarking techniques using face and demographic text data as multiple watermarks for protecting the evidentiary integrity of fingerprint image. The proposed techniques embed the watermarks into selected regions of fingerprint image in MDCT and DWT domains. A general image watermarking algorithm is developed to investigate the application of MDCT in the elimination of blocking artifacts. The application of MDCT has improved the performance of the watermarking technique compared to DCT. Experimental results show that modifications to fingerprint image are visually imperceptible and maintain the minutiae detail. The integrity of the fingerprint image is verified through high matching score obtained from the AFIS system. There is also a high degree of correlation between the embedded and extracted watermarks. The degree of similarity is computed using pixel-based metrics and human visual system metrics. It is useful for personal identification and establishing digital chain of custody. The results also show that the proposed watermarking technique is resilient to common image modifications that occur during electronic fingerprint transmission
    corecore