74 research outputs found
Noise robust speaker verification using mel-frequency discrete wavelet coefficients and parallel model compensation
Interfering noise severely degrades the performance of a speaker verification system. The Parallel Model Combination (PMC) technique is one of the most efficient techniques for dealing with such noise. Another method is to use features local in the frequency domain. Recently, Mel-Frequency Discrete Wavelet Coefficients (MFDWCs) [1, 2] were proposed as speech features local in frequency domain. In this paper, we discuss using PMC along with MFDWCs features to take advantage of both noise compensation and local features (MFDWCs) to decrease the effect of noise on speaker verification performance. We evaluate the performance of MFDWCs using the NIST 1998 speaker recognition and NOISEX-92 databases for various noise types and noise levels. We also compare the performance of these versus MFCCs and both using PMC for dealing with additive noise. The experimental results show significant performance improvements for MFDWCs versus MFCCs after compensating the Gaussian Mixture Models (GMMs) using the PMC technique. The MFDWCs gave 5.24 and 3.23 points performance improvement on average over MFCCs for -6 dB and 0 dB SNR values, respectively. These correspond to 26.44% and 23.73% relative reductions in equal error rate (EER), respectively
Wavelet methods in speech recognition
In this thesis, novel wavelet techniques are developed to improve parametrization of
speech signals prior to classification. It is shown that non-linear operations carried out
in the wavelet domain improve the performance of a speech classifier and consistently
outperform classical Fourier methods. This is because of the localised nature of the
wavelet, which captures correspondingly well-localised time-frequency features
within the speech signal. Furthermore, by taking advantage of the approximation
ability of wavelets, efficient representation of the non-stationarity inherent in speech
can be achieved in a relatively small number of expansion coefficients. This is an
attractive option when faced with the so-called 'Curse of Dimensionality' problem of
multivariate classifiers such as Linear Discriminant Analysis (LDA) or Artificial
Neural Networks (ANNs). Conventional time-frequency analysis methods such as the
Discrete Fourier Transform either miss irregular signal structures and transients due to
spectral smearing or require a large number of coefficients to represent such
characteristics efficiently. Wavelet theory offers an alternative insight in the
representation of these types of signals.
As an extension to the standard wavelet transform, adaptive libraries of wavelet and
cosine packets are introduced which increase the flexibility of the transform. This
approach is observed to be yet more suitable for the highly variable nature of speech
signals in that it results in a time-frequency sampled grid that is well adapted to
irregularities and transients. They result in a corresponding reduction in the
misclassification rate of the recognition system. However, this is necessarily at the
expense of added computing time.
Finally, a framework based on adaptive time-frequency libraries is developed which
invokes the final classifier to choose the nature of the resolution for a given
classification problem. The classifier then performs dimensionaIity reduction on the
transformed signal by choosing the top few features based on their discriminant power. This approach is compared and contrasted to an existing discriminant wavelet
feature extractor.
The overall conclusions of the thesis are that wavelets and their relatives are capable
of extracting useful features for speech classification problems. The use of adaptive
wavelet transforms provides the flexibility within which powerful feature extractors
can be designed for these types of application
Directional edge and texture representations for image processing
An efficient representation for natural images is of fundamental importance in image processing and analysis. The commonly used separable transforms such as wavelets axe not best suited for images due to their inability to exploit directional regularities such as edges and oriented textural patterns; while most of the recently proposed directional schemes cannot represent these two types of features in a unified transform. This thesis focuses on the development of directional representations for images which can capture both edges and textures in a multiresolution manner. The thesis first considers the problem of extracting linear features with the multiresolution Fourier transform (MFT). Based on a previous MFT-based linear feature model, the work extends the extraction method into the situation when the image is corrupted by noise. The problem is tackled by the combination of a "Signal+Noise" frequency model, a refinement stage and a robust classification scheme. As a result, the MFT is able to perform linear feature analysis on noisy images on which previous methods failed. A new set of transforms called the multiscale polar cosine transforms (MPCT) are also proposed in order to represent textures. The MPCT can be regarded as real-valued MFT with similar basis functions of oriented sinusoids. It is shown that the transform can represent textural patches more efficiently than the conventional Fourier basis. With a directional best cosine basis, the MPCT packet (MPCPT) is shown to be an efficient representation for edges and textures, despite its high computational burden. The problem of representing edges and textures in a fixed transform with less complexity is then considered. This is achieved by applying a Gaussian frequency filter, which matches the disperson of the magnitude spectrum, on the local MFT coefficients. This is particularly effective in denoising natural images, due to its ability to preserve both types of feature. Further improvements can be made by employing the information given by the linear feature extraction process in the filter's configuration. The denoising results compare favourably against other state-of-the-art directional representations
A Panorama on Multiscale Geometric Representations, Intertwining Spatial, Directional and Frequency Selectivity
The richness of natural images makes the quest for optimal representations in
image processing and computer vision challenging. The latter observation has
not prevented the design of image representations, which trade off between
efficiency and complexity, while achieving accurate rendering of smooth regions
as well as reproducing faithful contours and textures. The most recent ones,
proposed in the past decade, share an hybrid heritage highlighting the
multiscale and oriented nature of edges and patterns in images. This paper
presents a panorama of the aforementioned literature on decompositions in
multiscale, multi-orientation bases or dictionaries. They typically exhibit
redundancy to improve sparsity in the transformed domain and sometimes its
invariance with respect to simple geometric deformations (translation,
rotation). Oriented multiscale dictionaries extend traditional wavelet
processing and may offer rotation invariance. Highly redundant dictionaries
require specific algorithms to simplify the search for an efficient (sparse)
representation. We also discuss the extension of multiscale geometric
decompositions to non-Euclidean domains such as the sphere or arbitrary meshed
surfaces. The etymology of panorama suggests an overview, based on a choice of
partially overlapping "pictures". We hope that this paper will contribute to
the appreciation and apprehension of a stream of current research directions in
image understanding.Comment: 65 pages, 33 figures, 303 reference
Contextual biometric watermarking of fingerprint images
This research presents contextual digital watermarking techniques using face and demographic text data as multiple watermarks for protecting the evidentiary integrity of fingerprint image. The proposed techniques embed the watermarks into selected regions of fingerprint image in MDCT and DWT domains. A general image watermarking algorithm is developed to investigate the application of MDCT in the elimination of blocking artifacts. The application of MDCT has improved the performance of the watermarking technique compared to DCT. Experimental results show that modifications to fingerprint image are visually imperceptible and maintain the minutiae detail. The integrity of the fingerprint image is verified through high matching score obtained from the AFIS system. There is also a high degree of correlation between the embedded and extracted watermarks. The degree of similarity is computed using pixel-based metrics and human visual system metrics. It is useful for personal identification and establishing digital chain of custody. The results also show that the proposed watermarking technique is resilient to common image modifications that occur during electronic fingerprint transmission
- …