9 research outputs found

    Human face detection techniques: A comprehensive review and future research directions

    Get PDF
    Face detection which is an effortless task for humans are complex to perform on machines. Recent veer proliferation of computational resources are paving the way for a frantic advancement of face detection technology. Many astutely developed algorithms have been proposed to detect faces. However, there is a little heed paid in making a comprehensive survey of the available algorithms. This paper aims at providing fourfold discussions on face detection algorithms. At first, we explore a wide variety of available face detection algorithms in five steps including history, working procedure, advantages, limitations, and use in other fields alongside face detection. Secondly, we include a comparative evaluation among different algorithms in each single method. Thirdly, we provide detailed comparisons among the algorithms epitomized to have an all inclusive outlook. Lastly, we conclude this study with several promising research directions to pursue. Earlier survey papers on face detection algorithms are limited to just technical details and popularly used algorithms. In our study, however, we cover detailed technical explanations of face detection algorithms and various recent sub-branches of neural network. We present detailed comparisons among the algorithms in all-inclusive and also under sub-branches. We provide strengths and limitations of these algorithms and a novel literature survey including their use besides face detection

    A Robust Face Recognition Algorithm for Real-World Applications

    Get PDF
    The proposed face recognition algorithm utilizes representation of local facial regions with the DCT. The local representation provides robustness against appearance variations in local regions caused by partial face occlusion or facial expression, whereas utilizing the frequency information provides robustness against changes in illumination. The algorithm also bypasses the facial feature localization step and formulates face alignment as an optimization problem in the classification stage

    Face Recognition and Facial Attribute Analysis from Unconstrained Visual Data

    Get PDF
    Analyzing human faces from visual data has been one of the most active research areas in the computer vision community. However, it is a very challenging problem in unconstrained environments due to variations in pose, illumination, expression, occlusion and blur between training and testing images. The task becomes even more difficult when only a limited number of images per subject is available for modeling these variations. In this dissertation, different techniques for performing classification of human faces as well as other facial attributes such as expression, age, gender, and head pose in uncontrolled settings are investigated. In the first part of the dissertation, a method for reconstructing the virtual frontal view from a given non-frontal face image using Markov Random Fields (MRFs) and an efficient variant of the Belief Propagation (BP) algorithm is introduced. In the proposed approach, the input face image is divided into a grid of overlapping patches and a globally optimal set of local warps is estimated to synthesize the patches at the frontal view. A set of possible warps for each patch is obtained by aligning it with images from a training database of frontal faces. The alignments are performed efficiently in the Fourier domain using an extension of the Lucas-Kanade (LK) algorithm that can handle illumination variations. The problem of finding the optimal warps is then formulated as a discrete labeling problem using an MRF. The reconstructed frontal face image can then be used with any face recognition technique. The two main advantages of our method are that it does not require manually selected facial landmarks as well as no head pose estimation is needed. In the second part, the task of face recognition in unconstrained settings is formulated as a domain adaptation problem. The domain shift is accounted for by deriving a latent subspace or domain, which jointly characterizes the multifactor variations using appropriate image formation models for each factor. The latent domain is defined as a product of Grassmann manifolds based on the underlying geometry of the tensor space, and recognition is performed across domain shift using statistics consistent with the tensor geometry. More specifically, given a face image from the source or target domain, multiple images of that subject are first synthesized under different illuminations, blur conditions, and 2D perturbations to form a tensor representation of the face. The orthogonal matrices obtained from the decomposition of this tensor, where each matrix corresponds to a factor variation, are used to characterize the subject as a point on a product of Grassmann manifolds. For cases with only one image per subject in the source domain, the identity of target domain faces is estimated using the geodesic distance on product manifolds. When multiple images per subject are available, an extension of kernel discriminant analysis is developed using a novel kernel based on the projection metric on product spaces. Furthermore, a probabilistic approach to the problem of classifying image sets on product manifolds is introduced. Understanding attributes such as expression, age class, and gender from face images has many applications in multimedia processing including content personalization, human-computer interaction, and facial identification. To achieve good performance in these tasks, it is important to be able to extract pertinent visual structures from the input data. In the third part of the dissertation, a fully automatic approach for performing classification of facial attributes based on hierarchical feature learning using sparse coding is presented. The proposed approach is generative in the sense that it does not use label information in the process of feature learning. As a result, the same feature representation can be applied for different tasks such as expression, age, and gender classification. Final classification is performed by linear SVM trained with the corresponding labels for each task. The last part of the dissertation presents an automatic algorithm for determining the head pose from a given face image. The face image is divided into a regular grid and represented by dense SIFT descriptors extracted from the grid points. Random Projection (RP) is then applied to reduce the dimension of the concatenated SIFT descriptor vector. Classification and regression using Support Vector Machine (SVM) are combined in order to obtain an accurate estimate of the head pose. The advantage of the proposed approach is that it does not require facial landmarks such as the eye and mouth corners, the nose tip to be extracted from the input face image as in many other methods

    Reconnaissance Biométrique par Fusion Multimodale de Visages

    Get PDF
    Biometric systems are considered to be one of the most effective methods of protecting and securing private or public life against all types of theft. Facial recognition is one of the most widely used methods, not because it is the most efficient and reliable, but rather because it is natural and non-intrusive and relatively accepted compared to other biometrics such as fingerprint and iris. The goal of developing biometric applications, such as facial recognition, has recently become important in smart cities. Over the past decades, many techniques, the applications of which include videoconferencing systems, facial reconstruction, security, etc. proposed to recognize a face in a 2D or 3D image. Generally, the change in lighting, variations in pose and facial expressions make 2D facial recognition less than reliable. However, 3D models may be able to overcome these constraints, except that most 3D facial recognition methods still treat the human face as a rigid object. This means that these methods are not able to handle facial expressions. In this thesis, we propose a new approach for automatic face verification by encoding the local information of 2D and 3D facial images as a high order tensor. First, the histograms of two local multiscale descriptors (LPQ and BSIF) are used to characterize both 2D and 3D facial images. Next, a tensor-based facial representation is designed to combine all the features extracted from 2D and 3D faces. Moreover, to improve the discrimination of the proposed tensor face representation, we used two multilinear subspace methods (MWPCA and MDA combined with WCCN). In addition, the WCCN technique is applied to face tensors to reduce the effect of intra-class directions using a normalization transform, as well as to improve the discriminating power of MDA. Our experiments were carried out on the three largest databases: FRGC v2.0, Bosphorus and CASIA 3D under different facial expressions, variations in pose and occlusions. The experimental results have shown the superiority of the proposed approach in terms of verification rate compared to the recent state-of-the-art method

    Weakly-Labeled Data and Identity-Normalization for Facial Image Analysis

    Get PDF
    RÉSUMÉ Cette thĂšse traite de l’amĂ©lioration de la reconnaissance faciale et de l’analyse de l’expression du visage en utilisant des sources d’informations faibles. Les donnĂ©es Ă©tiquetĂ©es sont souvent rares, mais les donnĂ©es non Ă©tiquetĂ©es contiennent souvent des informations utiles pour l’apprentissage d’un modĂšle. Cette thĂšse dĂ©crit deux exemples d’utilisation de cette idĂ©e. Le premier est une nouvelle mĂ©thode pour la reconnaissance faciale basĂ©e sur l’exploitation de donnĂ©es Ă©tiquetĂ©es faiblement ou bruyamment. Les donnĂ©es non Ă©tiquetĂ©es peuvent ĂȘtre acquises d’une maniĂšre qui offre des caractĂ©ristiques supplĂ©mentaires. Ces caractĂ©ristiques, tout en n’étant pas disponibles pour les donnĂ©es Ă©tiquetĂ©es, peuvent encore ĂȘtre utiles avec un peu de prĂ©voyance. Cette thĂšse traite de la combinaison d’un ensemble de donnĂ©es Ă©tiquetĂ©es pour la reconnaissance faciale avec des images des visages extraits de vidĂ©os sur YouTube et des images des visages obtenues Ă  partir d’un moteur de recherche. Le moteur de recherche web et le moteur de recherche vidĂ©o peuvent ĂȘtre considĂ©rĂ©s comme de classificateurs trĂšs faibles alternatifs qui fournissent des Ă©tiquettes faibles. En utilisant les rĂ©sultats de ces deux types de requĂȘtes de recherche comme des formes d’étiquettes faibles diffĂ©rents, une mĂ©thode robuste pour la classification peut ĂȘtre dĂ©veloppĂ©e. Cette mĂ©thode est basĂ©e sur des modĂšles graphiques, mais aussi incorporant une marge probabiliste. Plus prĂ©cisĂ©ment, en utilisant un modĂšle inspirĂ© par la variational relevance vector machine (RVM), une alternative probabiliste Ă  la support vector machine (SVM) est dĂ©veloppĂ©e. Contrairement aux formulations prĂ©cĂ©dentes de la RVM, le choix d’une probabilitĂ© a priori exponentielle est introduit pour produire une approximation de la pĂ©nalitĂ© L1. Les rĂ©sultats expĂ©rimentaux oĂč les Ă©tiquettes bruyantes sont simulĂ©es, et les deux expĂ©riences distinctes oĂč les Ă©tiquettes bruyantes de l’image et les rĂ©sultats de recherche vidĂ©o en utilisant des noms comme les requĂȘtes indiquent que l’information faible dans les Ă©tiquettes peut ĂȘtre exploitĂ©e avec succĂšs. Puisque le modĂšle dĂ©pend fortement des mĂ©thodes noyau de rĂ©gression clairsemĂ©es, ces mĂ©thodes sont examinĂ©es et discutĂ©es en dĂ©tail. Plusieurs algorithmes diffĂ©rents utilisant les distributions a priori pour encourager les modĂšles clairsemĂ©s sont dĂ©crits en dĂ©tail. Des expĂ©riences sont montrĂ©es qui illustrent le comportement de chacune de ces distributions. UtilisĂ©s en conjonction avec la rĂ©gression logistique, les effets de chaque distribution sur l’ajustement du modĂšle et la complexitĂ© du modĂšle sont montrĂ©s. Les extensions aux autres mĂ©thodes d’apprentissage machine sont directes, car l’approche est ancrĂ©e dans la probabilitĂ© bayĂ©sienne. Une expĂ©rience dans la prĂ©diction structurĂ©e utilisant un conditional random field pour une tĂąche d’imagerie mĂ©dicale est montrĂ©e pour illustrer comment ces distributions a priori peuvent ĂȘtre incorporĂ©es facilement Ă  d’autres tĂąches et peuvent donner de meilleurs rĂ©sultats. Les donnĂ©es Ă©tiquetĂ©es peuvent Ă©galement contenir des sources faibles d’informations qui ne peuvent pas nĂ©cessairement ĂȘtre utilisĂ©es pour un effet maximum. Par exemple les ensembles de donnĂ©es d’images des visages pour les tĂąches tels que, l’animation faciale contrĂŽlĂ©e par les performances des comĂ©diens, la reconnaissance des Ă©motions, et la prĂ©diction des points clĂ©s ou les repĂšres du visage contiennent souvent des Ă©tiquettes alternatives par rapport Ă  la tĂąche d’internet principale. Dans les donnĂ©es de reconnaissance des Ă©motions, par exemple, des Ă©tiquettes de l’émotion sont souvent rares. C’est peut-ĂȘtre parce que ces images sont extraites d’une vidĂ©o, dans laquelle seul un petit segment reprĂ©sente l’étiquette de l’émotion. En consĂ©quence, de nombreuses images de l’objet sont dans le mĂȘme contexte en utilisant le mĂȘme appareil photo ne sont pas utilisĂ©s. Toutefois, ces donnĂ©es peuvent ĂȘtre utilisĂ©es pour amĂ©liorer la capacitĂ© des techniques d’apprentissage de gĂ©nĂ©raliser pour des personnes nouvelles et pas encore vues en modĂ©lisant explicitement les variations vues prĂ©cĂ©demment liĂ©es Ă  l’identitĂ© et Ă  l’expression. Une fois l’identitĂ© et de la variation de l’expression sont sĂ©parĂ©es, les approches supervisĂ©es simples peuvent mieux gĂ©nĂ©raliser aux identitĂ©s de nouveau. Plus prĂ©cisĂ©ment, dans cette thĂšse, la modĂ©lisation probabiliste de ces sources de variation est utilisĂ©e pour identitĂ© normaliser et des diverses reprĂ©sentations d’images faciales. Une variĂ©tĂ© d’expĂ©riences sont dĂ©crites dans laquelle la performance est constamment amĂ©liorĂ©e, incluant la reconnaissance des Ă©motions, les animations faciales contrĂŽlĂ©es par des visages des comĂ©diens sans marqueurs et le suivi des points clĂ©s sur des visages. Dans de nombreux cas dans des images faciales, des sources d’information supplĂ©mentaire peuvent ĂȘtre disponibles qui peuvent ĂȘtre utilisĂ©es pour amĂ©liorer les tĂąches d’intĂ©rĂȘt. Cela comprend des Ă©tiquettes faibles qui sont prĂ©vues pendant la collecte des donnĂ©es, telles que la requĂȘte de recherche utilisĂ©e pour acquĂ©rir des donnĂ©es, ainsi que des informations d’identitĂ© dans le cas de plusieurs bases de donnĂ©es d’images expĂ©rimentales. Cette thĂšse soutient en principal que cette information doit ĂȘtre utilisĂ©e et dĂ©crit les mĂ©thodes pour le faire en utilisant les outils de la probabilitĂ©.----------ABSTRACT This thesis deals with improving facial recognition and facial expression analysis using weak sources of information. Labeled data is often scarce, but unlabeled data often contains information which is helpful to learning a model. This thesis describes two examples of using this insight. The first is a novel method for face-recognition based on leveraging weak or noisily labeled data. Unlabeled data can be acquired in a way which provides additional features. These features, while not being available for the labeled data, may still be useful with some foresight. This thesis discusses combining a labeled facial recognition dataset with face images extracted from videos on YouTube and face images returned from using a search engine. The web search engine and the video search engine can be viewed as very weak alternative classifier which provide “weak labels.” Using the results from these two different types of search queries as forms of weak labels, a robust method for classification can be developed. This method is based on graphical models, but also encorporates a probabilistic margin. More specifically, using a model inspired by the variational relevance vector machine (RVM), a probabilistic alternative to transductive support vector machines (TSVM) is further developed. In contrast to previous formulations of RVMs, the choice of an Exponential hyperprior is introduced to produce an approximation to the L1 penalty. Experimental results where noisy labels are simulated and separate experiments where noisy labels from image and video search results using names as queries both indicate that weak label information can be successfully leveraged. Since the model depends heavily on sparse kernel regression methods, these methods are reviewed and discussed in detail. Several different sparse priors algorithms are described in detail. Experiments are shown which illustrate the behavior of each of these sparse priors. Used in conjunction with logistic regression, each sparsity inducing prior is shown to have varying effects in terms of sparsity and model fit. Extending this to other machine learning methods is straight forward since it is grounded firmly in Bayesian probability. An experiment in structured prediction using Conditional Random Fields on a medical image task is shown to illustrate how sparse priors can easily be incorporated in other tasks, and can yield improved results. Labeled data may also contain weak sources of information that may not necessarily be used to maximum effect. For example, facial image datasets for the tasks of performance driven facial animation, emotion recognition, and facial key-point or landmark prediction often contain alternative labels from the task at hand. In emotion recognition data, for example, emotion labels are often scarce. This may be because these images are extracted from a video, in which only a small segment depicts the emotion label. As a result, many images of the subject in the same setting using the same camera are unused. However, this data can be used to improve the ability of learning techniques to generalize to new and unseen individuals by explicitly modeling previously seen variations related to identity and expression. Once identity and expression variation are separated, simpler supervised approaches can work quite well to generalize to unseen subjects. More specifically, in this thesis, probabilistic modeling of these sources of variation is used to “identity-normalize” various facial image representations. A variety of experiments are described in which performance on emotion recognition, markerless performance-driven facial animation and facial key-point tracking is consistently improved. This includes an algorithm which shows how this kind of normalization can be used for facial key-point localization. In many cases in facial images, sources of information may be available that can be used to improve tasks. This includes weak labels which are provided during data gathering, such as the search query used to acquire data, as well as identity information in the case of many experimental image databases. This thesis argues in main that this information should be used and describes methods for doing so using the tools of probability

    A Part-Based, Multiresolution, TensorFaces Approach to Image-Based Facial Verification

    No full text
    In the field of computer vision, multilinear (tensor) algebraic approaches to image-based face recognition have attracted interest in recent years. Previously, these methods have operated uniformly over the entire facial image at uniform resolution. In this thesis, we present a multiresolution, region-based multilinear method. By computing multiple multilinear models of various facial features, such as the eyes, nose, and mouth, in appropriate spatially-localized regions, we achieve a representation that, using the same amount of training data, is more discriminative for the purpose of facial verification. Adding a multiresolution image pyramid as well as a weighted signature further improves performance. We report encouraging experimental results on two datasets, one consisting of synthetic images, the other of real-world images
    corecore