37 research outputs found

    Polar Fusion Technique Analysis for Evaluating the Performances of Image Fusion of Thermal and Visual Images for Human Face Recognition

    Full text link
    This paper presents a comparative study of two different methods, which are based on fusion and polar transformation of visual and thermal images. Here, investigation is done to handle the challenges of face recognition, which include pose variations, changes in facial expression, partial occlusions, variations in illumination, rotation through different angles, change in scale etc. To overcome these obstacles we have implemented and thoroughly examined two different fusion techniques through rigorous experimentation. In the first method log-polar transformation is applied to the fused images obtained after fusion of visual and thermal images whereas in second method fusion is applied on log-polar transformed individual visual and thermal images. After this step, which is thus obtained in one form or another, Principal Component Analysis (PCA) is applied to reduce dimension of the fused images. Log-polar transformed images are capable of handling complicacies introduced by scaling and rotation. The main objective of employing fusion is to produce a fused image that provides more detailed and reliable information, which is capable to overcome the drawbacks present in the individual visual and thermal face images. Finally, those reduced fused images are classified using a multilayer perceptron neural network. The database used for the experiments conducted here is Object Tracking and Classification Beyond Visible Spectrum (OTCBVS) database benchmark thermal and visual face images. The second method has shown better performance, which is 95.71% (maximum) and on an average 93.81% as correct recognition rate.Comment: Proceedings of IEEE Workshop on Computational Intelligence in Biometrics and Identity Management (IEEE CIBIM 2011), Paris, France, April 11 - 15, 201


    Get PDF
    Human face recognition is an important area in the field of biometrics. It has been an active area of research for several decades, but still remains a challenging problem because of the complexity of the human face. In this thesis we describe fully automatic solutions that can locate faces and then perform identification and verification. We present a solution for face localisation using eye locations. We derive an efficient representation for the decision hyperplane of linear and nonlinear Support Vector Machines (SVMs). For this we introduce the novel concept of ρ\rho and η\eta prototypes. The standard formulation for the decision hyperplane is reformulated and expressed in terms of the two prototypes. Different kernels are treated separately to achieve further classification efficiency and to facilitate its adaptation to operate with the fast Fourier transform to achieve fast eye detection. Using the eye locations, we extract and normalise the face for size and in-plane rotations. Our method produces a more efficient representation of the SVM decision hyperplane than the well-known reduced set methods. As a result, our eye detection subsystem is faster and more accurate. The use of fractals and fractal image coding for object recognition has been proposed and used by others. Fractal codes have been used as features for recognition, but we need to take into account the distance between codes, and to ensure the continuity of the parameters of the code. We use a method based on fractal image coding for recognition, which we call the Fractal Neighbour Distance (FND). The FND relies on the Euclidean metric and the uniqueness of the attractor of a fractal code. An advantage of using the FND over fractal codes as features is that we do not have to worry about the uniqueness of, and distance between, codes. We only require the uniqueness of the attractor, which is already an implied property of a properly generated fractal code. Similar methods to the FND have been proposed by others, but what distinguishes our work from the rest is that we investigate the FND in greater detail and use our findings to improve the recognition rate. Our investigations reveal that the FND has some inherent invariance to translation, scale, rotation and changes to illumination. These invariances are image dependent and are affected by fractal encoding parameters. The parameters that have the greatest effect on recognition accuracy are the contrast scaling factor, luminance shift factor and the type of range block partitioning. The contrast scaling factor affect the convergence and eventual convergence rate of a fractal decoding process. We propose a novel method of controlling the convergence rate by altering the contrast scaling factor in a controlled manner, which has not been possible before. This helped us improve the recognition rate because under certain conditions better results are achievable from using a slower rate of convergence. We also investigate the effects of varying the luminance shift factor, and examine three different types of range block partitioning schemes. They are Quad-tree, HV and uniform partitioning. We performed experiments using various face datasets, and the results show that our method indeed performs better than many accepted methods such as eigenfaces. The experiments also show that the FND based classifier increases the separation between classes. The standard FND is further improved by incorporating the use of localised weights. A local search algorithm is introduced to find a best matching local feature using this locally weighted FND. The scores from a set of these locally weighted FND operations are then combined to obtain a global score, which is used as a measure of the similarity between two face images. Each local FND operation possesses the distortion invariant properties described above. Combined with the search procedure, the method has the potential to be invariant to a larger class of non-linear distortions. We also present a set of locally weighted FNDs that concentrate around the upper part of the face encompassing the eyes and nose. This design was motivated by the fact that the region around the eyes has more information for discrimination. Better performance is achieved by using different sets of weights for identification and verification. For facial verification, performance is further improved by using normalised scores and client specific thresholding. In this case, our results are competitive with current state-of-the-art methods, and in some cases outperform all those to which they were compared. For facial identification, under some conditions the weighted FND performs better than the standard FND. However, the weighted FND still has its short comings when some datasets are used, where its performance is not much better than the standard FND. To alleviate this problem we introduce a voting scheme that operates with normalised versions of the weighted FND. Although there are no improvements at lower matching ranks using this method, there are significant improvements for larger matching ranks. Our methods offer advantages over some well-accepted approaches such as eigenfaces, neural networks and those that use statistical learning theory. Some of the advantages are: new faces can be enrolled without re-training involving the whole database; faces can be removed from the database without the need for re-training; there are inherent invariances to face distortions; it is relatively simple to implement; and it is not model-based so there are no model parameters that need to be tweaked

    An Analysis of the Inner Workings of Variational Autoencoders

    Get PDF
    Representation learning, the task of extracting meaningful representations of high-dimensional data, lies at the very core of artificial intelligence research. Be it via implicit training of features in a variety of computer vision tasks, over more old-school, hand-crafted feature extraction mechanisms for, e.g., eye-tracking or other applications, all the way to explicit learning of semantically meaningful data representations. Strictly speaking, any activation of a layer within a neural network can be considered a representation of the input data. This makes the research about achieving explicit control over properties of such representations a fundamentally attractive task. An often desired property of learned representations is called disentanglement. The idea of a disentangled representation stems from the goal of separating sources of variance in the data and consolidates itself in the concept of recovering generative factors. Assuming that every data has its origin in a generative process that produces high-dimensional data given a low-dimensional representation (e.g., rendering images of people given visual attributes, such as hairstyle, camera angle, age, ...), the goal of finding a disentangled representation is to recover those attributes. The Variational Autoencoder (VAE) is a famous architecture commonly used for disentangled representation learning, and this work summarizes an analysis of its inner workings. VAEs achieved a lot of attention due to their, at the time, unparalleled performance as both generative models and inference models for learning disentangled representations. However, note that the disentanglement property of a representation is not invariant to rotations of the learned representation, i.e., rotating a learned representation can change and destroy its disentanglement quality. Given a rotationally symmetric prior over the representations space, the idealized objective function of VAEs is rotationally symmetric. Their success at producing disentangled representations consequently comes as a particular surprise. This thesis discusses why VAEs pursue a particular alignment for their representations and how the chosen alignment is correlated with the generative factors of existing representation learning datasets

    Human Recognition and Identification: identification of Persons in the social context based on image processing

    Get PDF
    The aim of this Bachelor’s Thesis is to provide the therapeutic robot TUK with an identification software, so that it is able to recognize people in its environment. Moreover, it is intended that TUK can identify who it is interacting with, and thus adapt its behaviour depending on the situation. In the first part, the theoretical principles that have been used during the development of the software are presented. Several kinds of image processing techniques as well as classification algorithms are explained. In the second part, the implementation is shown step by step in order to give an overview of the whole system. Finally, the results obtained during several tests are presented and discussed. In conclusion, several guidelines for tackling some of the challenges are proposed, setting a possible way for further work

    Block-level discrete cosine transform coefficients for autonomic face recognition

    Get PDF
    This dissertation presents a novel method of autonomic face recognition based on the recently proposed biologically plausible network of networks (NoN) model of information processing. The NoN model is based on locally parallel and globally coordinated transformations. In the NoN architecture, the neurons or computational units form distributed networks, which themselves link to form larger networks. In the general case, an n-level hierarchy of nested distributed networks is constructed. This models the structures in the cerebral cortex described by Mountcastle and the architecture based on that proposed for information processing by Sutton. In the implementation proposed in the dissertation, the image is processed by a nested family of locally operating networks along with a hierarchically superior network that classifies the information from each of the local networks. The implementation of this approach helps obtain sensitivity to the contrast sensitivity function (CSF) in the middle of the spectrum, as is true for the human vision system. The input images are divided into blocks to define the local regions of processing. The two-dimensional Discrete Cosine Transform (DCT), a spatial frequency transform, is used to transform the data into the frequency domain. Thereafter, statistical operators that calculate various functions of spatial frequency in the block are used to produce a block-level DCT coefficient. The image is now transformed into a variable length vector that is trained with respect to the data set. The classification was done by the use of a backpropagation neural network. The proposed method yields excellent results on a benchmark database. The results of the experiments yielded a maximum of 98.5% recognition accuracy and an average of 97.4% recognition accuracy. An advanced version of the method where the local processing is done on offset blocks has also been developed. This has validated the NoN approach and further research using local processing as well as more advanced global operators is likely to yield even better results

    QUEST Hierarchy for Hyperspectral Face Recognition

    Get PDF
    Face recognition is an attractive biometric due to the ease in which photographs of the human face can be acquired and processed. The non-intrusive ability of many surveillance systems permits face recognition applications to be used in a myriad of environments. Despite decades of impressive research in this area, face recognition still struggles with variations in illumination, pose and expression not to mention the larger challenge of willful circumvention. The integration of supporting contextual information in a fusion hierarchy known as QUalia Exploitation of Sensor Technology (QUEST) is a novel approach for hyperspectral face recognition that results in performance advantages and a robustness not seen in leading face recognition methodologies. This research demonstrates a method for the exploitation of hyperspectral imagery and the intelligent processing of contextual layers of spatial, spectral, and temporal information. This approach illustrates the benefit of integrating spatial and spectral domains of imagery for the automatic extraction and integration of novel soft features (biometric). The establishment of the QUEST methodology for face recognition results in an engineering advantage in both performance and efficiency compared to leading and classical face recognition techniques. An interactive environment for the testing and expansion of this recognition framework is also provided