65 research outputs found
Texture Analysis Using Probabilistic Models of the Unimodal and Multimodal Statistics of Adaptive Wavelet Packet Coefficients
Although subband histograms of the wavelet coefficients of natural images possess a characteristic leptokurtotic form, this is no longer true for wavelet packet bases adapted to a given texture. Instead, three types of subband statistics are observed: Gaussian, leptokurtotic, and interestingly, in some subbands, multimodal histograms. These subbands are closely linked to the structure of the texture, and guarantee that the most probable image is not flat. Motivated by these observations, we propose a probabilistic model that takes them into account. Adaptive wavelet packet subbands are modelled as Gaussian, generalized Gaussian, or a constrained Gaussian mixture. We use a Bayesian methodology, finding MAP estimates for the adaptive basis, for subband model selection, and for subband model parameters. Results confirm the effectiveness of the proposed approach, and highlight the importance of multimodal subbands for texture discrimination and modelling
Models of the Unimodal and Multimodal Statistics of Adaptive Wavelet Packet Coefficients
In recent work, it was noted that although the subband histograms for standard wavelet coefficients take on a generalized Gaussian form, this is no longer true for wavelet packet bases adapted to a given texture. Instead, three types of subband statistics are observed: Gaussian, generalized Gaussian, and most interestingly, in some subbands, multimodal histograms with no mode at zero. As will be demonstrated in this report, these latter subbands are closely linked to the structure of the texture, and are thus likely to be important for many applications in which texture plays a role. Motivated by these observations, we extend the approach to texture modelling proposed by to include these subbands. We relax the Gaussian assumption to include generalized Gaussians and constrained Gaussian mixtures. We use a Bayesian methodology, finding MAP estimates for the adaptive basis, for subband model selection, and for subband model parameters. Results confirm the effectiveness of the proposed approach, and highlight the importance of multimodal subbands for texture discrimination and modelling
Multibiometric security in wireless communication systems
This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University, 05/08/2010.This thesis has aimed to explore an application of Multibiometrics to secured wireless communications. The medium of study for this purpose included Wi-Fi, 3G, and
WiMAX, over which simulations and experimental studies were carried out to assess the performance. In specific, restriction of access to authorized users only is provided by a technique referred to hereafter as multibiometric cryptosystem. In brief, the system is built upon a complete challenge/response methodology in order to obtain a high level of security on the basis of user identification by fingerprint and further confirmation by verification of the user through text-dependent speaker recognition.
First is the enrolment phase by which the database of watermarked fingerprints with
memorable texts along with the voice features, based on the same texts, is created by sending them to the server through wireless channel.
Later is the verification stage at which claimed users, ones who claim are genuine, are verified against the database, and it consists of five steps. Initially faced by the identification level, one is asked to first present oneâs fingerprint and a memorable word, former is watermarked into latter, in order for system to authenticate the fingerprint and verify the validity of it by retrieving the challenge for accepted user.
The following three steps then involve speaker recognition including the user
responding to the challenge by text-dependent voice, server authenticating the response, and finally server accepting/rejecting the user.
In order to implement fingerprint watermarking, i.e. incorporating the memorable word as a watermark message into the fingerprint image, an algorithm of five steps has been developed. The first three novel steps having to do with the fingerprint
image enhancement (CLAHE with 'Clip Limit', standard deviation analysis and
sliding neighborhood) have been followed with further two steps for embedding, and
extracting the watermark into the enhanced fingerprint image utilising Discrete
Wavelet Transform (DWT).
In the speaker recognition stage, the limitations of this technique in wireless
communication have been addressed by sending voice feature (cepstral coefficients)
instead of raw sample. This scheme is to reap the advantages of reducing the
transmission time and dependency of the data on communication channel, together
with no loss of packet. Finally, the obtained results have verified the claims
Multibiometric security in wireless communication systems
This thesis has aimed to explore an application of Multibiometrics to secured wireless communications. The medium of study for this purpose included Wi-Fi, 3G, and WiMAX, over which simulations and experimental studies were carried out to assess the performance. In specific, restriction of access to authorized users only is provided by a technique referred to hereafter as multibiometric cryptosystem. In brief, the system is built upon a complete challenge/response methodology in order to obtain a high level of security on the basis of user identification by fingerprint and further confirmation by verification of the user through text-dependent speaker recognition. First is the enrolment phase by which the database of watermarked fingerprints with memorable texts along with the voice features, based on the same texts, is created by sending them to the server through wireless channel. Later is the verification stage at which claimed users, ones who claim are genuine, are verified against the database, and it consists of five steps. Initially faced by the identification level, one is asked to first present oneâs fingerprint and a memorable word, former is watermarked into latter, in order for system to authenticate the fingerprint and verify the validity of it by retrieving the challenge for accepted user. The following three steps then involve speaker recognition including the user responding to the challenge by text-dependent voice, server authenticating the response, and finally server accepting/rejecting the user. In order to implement fingerprint watermarking, i.e. incorporating the memorable word as a watermark message into the fingerprint image, an algorithm of five steps has been developed. The first three novel steps having to do with the fingerprint image enhancement (CLAHE with 'Clip Limit', standard deviation analysis and sliding neighborhood) have been followed with further two steps for embedding, and extracting the watermark into the enhanced fingerprint image utilising Discrete Wavelet Transform (DWT). In the speaker recognition stage, the limitations of this technique in wireless communication have been addressed by sending voice feature (cepstral coefficients) instead of raw sample. This scheme is to reap the advantages of reducing the transmission time and dependency of the data on communication channel, together with no loss of packet. Finally, the obtained results have verified the claims.EThOS - Electronic Theses Online ServiceGBUnited Kingdo
Recommended from our members
Evaluation and analysis of hybrid intelligent pattern recognition techniques for speaker identification
This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University.The rapid momentum of the technology progress in the recent years has led to a tremendous rise in the use of biometric authentication systems. The objective of this research is to investigate the problem
of identifying a speaker from its voice regardless of the content (i.e.
text-independent), and to design efficient methods of combining face and voice in producing a robust authentication system.
A novel approach towards speaker identification is developed using
wavelet analysis, and multiple neural networks including Probabilistic
Neural Network (PNN), General Regressive Neural Network (GRNN)and Radial Basis Function-Neural Network (RBF NN) with the AND
voting scheme. This approach is tested on GRID and VidTIMIT cor-pora and comprehensive test results have been validated with state-
of-the-art approaches. The system was found to be competitive and it improved the recognition rate by 15% as compared to the classical Mel-frequency Cepstral Coe±cients (MFCC), and reduced the recognition time by 40% compared to Back Propagation Neural Network (BPNN), Gaussian Mixture Models (GMM) and Principal Component Analysis (PCA).
Another novel approach using vowel formant analysis is implemented using Linear Discriminant Analysis (LDA). Vowel formant based speaker identification is best suitable for real-time implementation and requires only a few bytes of information to be stored for each speaker, making it both storage and time efficient. Tested on GRID and Vid-TIMIT, the proposed scheme was found to be 85.05% accurate when Linear Predictive Coding (LPC) is used to extract the vowel formants, which is much higher than the accuracy of BPNN and GMM. Since the proposed scheme does not require any training time other than creating a small database of vowel formants, it is faster as well. Furthermore, an increasing number of speakers makes it di±cult for BPNN and GMM to sustain their accuracy, but the proposed score-based methodology stays almost linear.
Finally, a novel audio-visual fusion based identification system is implemented using GMM and MFCC for speaker identiÂŻcation and PCA for face recognition. The results of speaker identification and face recognition are fused at different levels, namely the feature, score and decision levels. Both the score-level and decision-level (with OR voting) fusions were shown to outperform the feature-level fusion in terms of accuracy and error resilience. The result is in line with the distinct nature of the two modalities which lose themselves when combined at the feature-level. The GRID and VidTIMIT test results validate that
the proposed scheme is one of the best candidates for the fusion of
face and voice due to its low computational time and high recognition accuracy
Multimodal Sensing and Data Processing for Speaker and Emotion Recognition using Deep Learning Models with Audio, Video and Biomedical Sensors
The focus of the thesis is on Deep Learning methods and their applications on multimodal data, with a potential to explore the associations between modalities and replace missing and corrupt ones if necessary. We have chosen two important real-world applications that need to deal with multimodal data: 1) Speaker recognition and identification; 2) Facial expression recognition and emotion detection.
The first part of our work assesses the effectiveness of speech-related sensory data modalities and their combinations in speaker recognition using deep learning models. First, the role of electromyography (EMG) is highlighted as a unique biometric sensor in improving audio-visual speaker recognition or as a substitute in noisy or poorly-lit environments. Secondly, the effectiveness of deep learning is empirically confirmed through its higher robustness to all types of features in comparison to a number of commonly used baseline classifiers. Not only do deep models outperform the baseline methods, their power increases when they integrate multiple modalities, as different modalities contain information on different aspects of the data, especially between EMG and audio. Interestingly, our deep learning approach is word-independent. Plus, the EMG, audio, and visual parts of the samples from each speaker do not need to match. This increases the flexibility of our method in using multimodal data, particularly if one or more modalities are missing. With a dataset of 23 individuals speaking 22 words five times, we show that EMG can replace the audio/visual modalities, and when combined, significantly improve the accuracy of speaker recognition.
The second part describes a study on automated emotion recognition using four different modalities â audio, video, electromyography (EMG), and electroencephalography (EEG). We collected a dataset by recording the 4 modalities as 12 human subjects expressed six different emotions or maintained a neutral expression. Three different aspects of emotion recognition were investigated: model selection, feature selection, and data selection. Both generative models (DBNs) and discriminative models (LSTMs) were applied to the four modalities, and from these analyses we conclude that LSTM is better for audio and video together with their corresponding sophisticated feature extractors (MFCC and CNN), whereas DBN is better for both EMG and EEG. By examining these signals at different stages (pre-speech, during-speech, and post-speech) of the current and following trials, we have found that the most effective stages for emotion recognition from EEG occur after the emotion has been expressed, suggesting that the neural signals conveying an emotion are long-lasting
- âŠ