1,179 research outputs found
A statistical multiresolution approach for face recognition using structural hidden Markov models
This paper introduces a novel methodology that combines the multiresolution feature of the discrete wavelet transform (DWT) with the local interactions of the facial structures expressed through the structural hidden Markov model (SHMM). A range of wavelet filters such as Haar, biorthogonal 9/7, and Coiflet, as well as Gabor, have been implemented in order to search for the best performance. SHMMs perform a thorough probabilistic analysis of any sequential pattern by revealing both its inner and outer structures simultaneously. Unlike traditional HMMs, the SHMMs do not perform the state conditional independence of the visible observation sequence assumption. This is achieved via the concept of local structures introduced by the SHMMs. Therefore, the long-range dependency problem inherent to traditional HMMs has been drastically reduced. SHMMs have not previously been applied to the problem of face identification. The results reported in this application have shown that SHMM outperforms the traditional hidden Markov model with a 73% increase in accuracy
Human Face Recognition
Face recognition, as the main biometric used by human beings, has become more popular for the last twenty years. Automatic recognition of human faces has many commercial and security applications in identity validation and recognition and has become one of the hottest topics in the area of image processing and pattern recognition since 1990. Availability of feasible technologies as well as the increasing request for reliable security systems in today’s world has been a motivation for many researchers to develop new methods for face recognition. In automatic face recognition we desire to either identify or verify one or more persons in still or video images of a scene by means of a stored database of faces. One of the important features of face recognition is its non-intrusive and non-contact property that distinguishes it from other biometrics like iris or finger print recognition that require subjects’ participation. During the last two decades several face recognition algorithms and systems have been proposed and some major advances have been achieved. As a result, the performance of face recognition systems under controlled conditions has now reached a satisfactory level. These systems, however, face some challenges in environments with variations in illumination, pose, expression, etc. The objective of this research is designing a reliable automated face recognition system which is robust under varying conditions of noise level, illumination and occlusion. A new method for illumination invariant feature extraction based on the illumination-reflectance model is proposed which is computationally efficient and does not require any prior information about the face model or illumination. A weighted voting scheme is also proposed to enhance the performance under illumination variations and also cancel occlusions. The proposed method uses mutual information and entropy of the images to generate different weights for a group of ensemble classifiers based on the input image quality. The method yields outstanding results by reducing the effect of both illumination and occlusion variations in the input face images
Audio-Visual Automatic Speech Recognition Using PZM, MFCC and Statistical Analysis
Audio-Visual Automatic Speech Recognition (AV-ASR) has become the most promising research area when the audio signal gets corrupted by noise. The main objective of this paper is to select the important and discriminative audio and visual speech features to recognize audio-visual speech. This paper proposes Pseudo Zernike Moment (PZM) and feature selection method for audio-visual speech recognition. Visual information is captured from the lip contour and computes the moments for lip reading. We have extracted 19th order of Mel Frequency Cepstral Coefficients (MFCC) as speech features from audio. Since all the 19 speech features are not equally important, therefore, feature selection algorithms are used to select the most efficient features. The various statistical algorithm such as Analysis of Variance (ANOVA), Kruskal-wallis, and Friedman test are employed to analyze the significance of features along with Incremental Feature Selection (IFS) technique. Statistical analysis is used to analyze the statistical significance of the speech features and after that IFS is used to select the speech feature subset. Furthermore, multiclass Support Vector Machine (SVM), Artificial Neural Network (ANN) and Naive Bayes (NB) machine learning techniques are used to recognize the speech for both the audio and visual modalities. Based on the recognition rate combined decision is taken from the two individual recognition systems. This paper compares the result achieved by the proposed model and the existing model for both audio and visual speech recognition. Zernike Moment (ZM) is compared with PZM and shows that our proposed model using PZM extracts better discriminative features for visual speech recognition. This study also proves that audio feature selection using statistical analysis outperforms methods without any feature selection technique
Various Approaches of Support vector Machines and combined Classifiers in Face Recognition
In this paper we present the various approaches used in face recognition from 2001-2012.because in last decade face recognition is using in many fields like Security sectors, identity authentication. Today we need correct and speedy performance in face recognition. This time the face recognition technology is in matured stage because research is conducting continuously in this field. Some extensions of Support vector machine (SVM) is reviewed that gives amazing performance in face recognition.Here we also review some papers of combined classifier approaches that is also a dynamic research area in a pattern recognition
A Survey of Iris Recognition System
The uniqueness of iris texture makes it one of the reliable physiological biometric traits compare to the other biometric traits. In this paper, we investigate a different level of fusion approach in iris image. Although, a number of iris recognition methods has been proposed in recent years, however most of them focus on the feature extraction and classification method. Less number of method focuses on the information fusion of iris images. Fusion is believed to produce a better discrimination power in the feature space, thus we conduct an analysis to investigate which fusion level is able to produce the best result for iris recognition system. Experimental analysis using CASIA dataset shows feature level fusion produce 99% recognition accuracy. The verification analysis shows the best result is GAR = 95% at the FRR = 0.1
Face Processing & Frontal Face Verification
In this report we first review important publications in the field of face recognition; geometric features, templates, Principal Component Analysis (PCA), pseudo-2D Hidden Markov Models, Elastic Graph Matching, as well as other points are covered; important issues, such as the effects of an illumination direction change and the use of different face areas, are also covered. A new feature set (termed DCT-mod2) is then proposed; the feature set utilizes polynomial coefficients derived from 2D Discrete Cosine Transform (DCT) coefficients obtained from horizontally & vertically neighbouring blocks. Face authentication results on the VidTIMIT database suggest that the proposed feature set is superior (in terms of robustness to illumination changes and discrimination ability) to features extracted using four popular methods: PCA, PCA with histogram equalization pre-processing, 2D DCT and 2D Gabor wavelets; the results also suggest that histogram equalization pre-processing increases the error rate and offers no help against illumination changes. Moreover, the proposed feature set is over 80 times faster to compute than features based on 2D Gabor wavelets. Further experiments on the Weizmann Database also show that the proposed approach is more robust than 2D Gabor wavelets and 2D DCT coefficients
Recommended from our members
Evaluation and analysis of hybrid intelligent pattern recognition techniques for speaker identification
This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University.The rapid momentum of the technology progress in the recent years has led to a tremendous rise in the use of biometric authentication systems. The objective of this research is to investigate the problem
of identifying a speaker from its voice regardless of the content (i.e.
text-independent), and to design efficient methods of combining face and voice in producing a robust authentication system.
A novel approach towards speaker identification is developed using
wavelet analysis, and multiple neural networks including Probabilistic
Neural Network (PNN), General Regressive Neural Network (GRNN)and Radial Basis Function-Neural Network (RBF NN) with the AND
voting scheme. This approach is tested on GRID and VidTIMIT cor-pora and comprehensive test results have been validated with state-
of-the-art approaches. The system was found to be competitive and it improved the recognition rate by 15% as compared to the classical Mel-frequency Cepstral Coe±cients (MFCC), and reduced the recognition time by 40% compared to Back Propagation Neural Network (BPNN), Gaussian Mixture Models (GMM) and Principal Component Analysis (PCA).
Another novel approach using vowel formant analysis is implemented using Linear Discriminant Analysis (LDA). Vowel formant based speaker identification is best suitable for real-time implementation and requires only a few bytes of information to be stored for each speaker, making it both storage and time efficient. Tested on GRID and Vid-TIMIT, the proposed scheme was found to be 85.05% accurate when Linear Predictive Coding (LPC) is used to extract the vowel formants, which is much higher than the accuracy of BPNN and GMM. Since the proposed scheme does not require any training time other than creating a small database of vowel formants, it is faster as well. Furthermore, an increasing number of speakers makes it di±cult for BPNN and GMM to sustain their accuracy, but the proposed score-based methodology stays almost linear.
Finally, a novel audio-visual fusion based identification system is implemented using GMM and MFCC for speaker identi¯cation and PCA for face recognition. The results of speaker identification and face recognition are fused at different levels, namely the feature, score and decision levels. Both the score-level and decision-level (with OR voting) fusions were shown to outperform the feature-level fusion in terms of accuracy and error resilience. The result is in line with the distinct nature of the two modalities which lose themselves when combined at the feature-level. The GRID and VidTIMIT test results validate that
the proposed scheme is one of the best candidates for the fusion of
face and voice due to its low computational time and high recognition accuracy
- …