54 research outputs found

    SfSNet: Learning Shape, Reflectance and Illuminance of Faces in the Wild

    Full text link
    We present SfSNet, an end-to-end learning framework for producing an accurate decomposition of an unconstrained human face image into shape, reflectance and illuminance. SfSNet is designed to reflect a physical lambertian rendering model. SfSNet learns from a mixture of labeled synthetic and unlabeled real world images. This allows the network to capture low frequency variations from synthetic and high frequency details from real images through the photometric reconstruction loss. SfSNet consists of a new decomposition architecture with residual blocks that learns a complete separation of albedo and normal. This is used along with the original image to predict lighting. SfSNet produces significantly better quantitative and qualitative results than state-of-the-art methods for inverse rendering and independent normal and illumination estimation.Comment: Accepted to CVPR 2018 (Spotlight

    Gender recognition from facial images: Two or three dimensions?

    Get PDF
    © 2016 Optical Society of America. This paper seeks to compare encoded features from both two-dimensional (2D) and three-dimensional (3D) face images in order to achieve automatic gender recognition with high accuracy and robustness. The Fisher vector encoding method is employed to produce 2D, 3D, and fused features with escalated discriminative power. For 3D face analysis, a two-source photometric stereo (PS) method is introduced that enables 3D surface reconstructions with accurate details as well as desirable efficiency. Moreover, a 2D + 3D imaging device, taking the two-source PS method as its core, has been developed that can simultaneously gather color images for 2D evaluations and PS images for 3D analysis. This system inherits the superior reconstruction accuracy from the standard (three or more light) PS method but simplifies the reconstruction algorithm as well as the hardware design by only requiring two light sources. It also offers great potential for facilitating human computer interaction by being accurate, cheap, efficient, and nonintrusive. Ten types of low-level 2D and 3D features have been experimented with and encoded for Fisher vector gender recognition. Evaluations of the Fisher vector encoding method have been performed on the FERET database, Color FERET database, LFW database, and FRGCv2 database, yielding 97.7%, 98.0%, 92.5%, and 96.7% accuracy, respectively. In addition, the comparison of 2D and 3D features has been drawn from a self-collected dataset, which is constructed with the aid of the 2D + 3D imaging device in a series of data capture experiments. With a variety of experiments and evaluations, it can be proved that the Fisher vector encoding method outperforms most state-of-the-art gender recognition methods. It has also been observed that 3D features reconstructed by the two-source PS method are able to further boost the Fisher vector gender recognition performance, i.e., up to a 6% increase on the self-collected database

    Disentangling the modes of variation in unlabelled data

    Get PDF
    Statistical methods are of paramount importance in discovering the modes of variation in visual data. The Principal Component Analysis (PCA) is probably the most prominent method for extracting a single mode of variation in the data. However, in practice, visual data exhibit several modes of variations. For instance, the appearance of faces varies in identity, expression, pose etc. To extract these modes of variations from visual data, several supervised methods, such as the TensorFaces relying on multilinear (tensor) decomposition (e.g., Higher Order SVD) have been developed. The main drawbacks of such methods is that they require both labels regarding the modes of variations and the same number of samples under all modes of variations (e.g., the same face under different expressions, poses etc.). Therefore, their applicability is limited to well-organised data, usually captured in well-controlled conditions. In this paper, we propose a novel general multilinear matrix decomposition method that discovers the multilinear structure of possibly incomplete sets of visual data in unsupervised setting (i.e., without the presence of labels). We also propose extensions of the method with sparsity and low-rank constraints in order to handle noisy data, captured in unconstrained conditions. Besides that, a graph-regularised variant of the method is also developed in order to exploit available geometric or label information for some modes of variations. We demonstrate the applicability of the proposed method in several computer vision tasks, including Shape from Shading (SfS) (in the wild and with occlusion removal), expression transfer, and estimation of surface normals from images captured in the wild

    Face recognition in 2D and 2.5D using ridgelets and photometric stereo

    Get PDF
    A new technique for face recognition - Ridgefaces - is presented. The method combines the well-known Fisherface method with the ridgelet transform and high-speed Photometric Stereo (PS). The paper first derives ridgelet projections for 2D/2.5D face images before the Fisherface approach is used to reduce the dimensionality and increase the spread of the resulting feature vectors. The ridgelet transform is attractive because it is efficient at extracting highly discriminating low-frequency directional features. Best recognition is obtained when Ridgefaces is performed on surface normals acquired from PS, although good results are also found using standard 2D images and PS-derived albedo maps. © 2012 Elsevier Ltd. All rights reserved

    Disentangling the modes of variation in unlabelled data

    Get PDF
    Statistical methods are of paramount importance in discovering the modes of variation in visual data. The Principal Component Analysis (PCA) is probably the most prominent method for extracting a single mode of variation in the data. However, in practice, visual data exhibit several modes of variations. For instance, the appearance of faces varies in identity, expression, pose etc. To extract these modes of variations from visual data, several supervised methods, such as the TensorFaces relying on multilinear (tensor) decomposition (e.g., Higher Order SVD) have been developed. The main drawbacks of such methods is that they require both labels regarding the modes of variations and the same number of samples under all modes of variations (e.g., the same face under different expressions, poses etc.). Therefore, their applicability is limited to well-organised data, usually captured in well-controlled conditions. In this paper, we propose a novel general multilinear matrix decomposition method that discovers the multilinear structure of possibly incomplete sets of visual data in unsupervised setting (i.e., without the presence of labels). We also propose extensions of the method with sparsity and low-rank constraints in order to handle noisy data, captured in unconstrained conditions. Besides that, a graph-regularised variant of the method is also developed in order to exploit available geometric or label information for some modes of variations. We demonstrate the applicability of the proposed method in several computer vision tasks, including Shape from Shading (SfS) (in the wild and with occlusion removal), expression transfer, and estimation of surface normals from images captured in the wild

    3D face recognition using photometric stereo

    Get PDF
    Automatic face recognition has been an active research area for the last four decades. This thesis explores innovative bio-inspired concepts aimed at improved face recognition using surface normals. New directions in salient data representation are explored using data captured via a photometric stereo method from the University of the West of England’s “Photoface” device. Accuracy assessments demonstrate the advantage of the capture format and the synergy offered by near infrared light sources in achieving more accurate results than under conventional visible light. Two 3D face databases have been created as part of the thesis – the publicly available Photoface database which contains 3187 images of 453 subjects and the 3DE-VISIR dataset which contains 363 images of 115 people with different expressions captured simultaneously under near infrared and visible light. The Photoface database is believed to be the ?rst to capture naturalistic 3D face models. Subsets of these databases are then used to show the results of experiments inspired by the human visual system. Experimental results show that optimal recognition rates are achieved using surprisingly low resolution of only 10x10 pixels on surface normal data, which corresponds to the spatial frequency range of optimal human performance. Motivated by the observed increase in recognition speed and accuracy that occurs in humans when faces are caricatured, novel interpretations of caricaturing using outlying data and pixel locations with high variance show that performance remains disproportionately high when up to 90% of the data has been discarded. These direct methods of dimensionality reduction have useful implications for the storage and processing requirements for commercial face recognition systems. The novel variance approach is extended to recognise positive expressions with 90% accuracy which has useful implications for human-computer interaction as well as ensuring that a subject has the correct expression prior to recognition. Furthermore, the subject recognition rate is improved by removing those pixels which encode expression. Finally, preliminary work into feature detection on surface normals by extending Haar-like features is presented which is also shown to be useful for correcting the pose of the head as part of a fully operational device. The system operates with an accuracy of 98.65% at a false acceptance rate of only 0.01 on front facing heads with neutral expressions. The work has shown how new avenues of enquiry inspired by our observation of the human visual system can offer useful advantages towards achieving more robust autonomous computer-based facial recognition

    Multilinear methods for disentangling variations with applications to facial analysis

    Get PDF
    Several factors contribute to the appearance of an object in a visual scene, including pose, illumination, and deformation, among others. Each factor accounts for a source of variability in the data. It is assumed that the multiplicative interactions of these factors emulate the entangled variability, giving rise to the rich structure of visual object appearance. Disentangling such unobserved factors from visual data is a challenging task, especially when the data have been captured in uncontrolled recording conditions (also referred to as “in-the-wild”) and label information is not available. The work presented in this thesis focuses on disentangling the variations contained in visual data, in particular applied to 2D and 3D faces. The motivation behind this work lies in recent developments in the field, such as (i) the creation of large, visual databases for face analysis, with (ii) the need of extracting information without the use of labels and (iii) the need to deploy systems under demanding, real-world conditions. In the first part of this thesis, we present a method to synthesise plausible 3D expressions that preserve the identity of a target subject. This method is supervised as the model uses labels, in this case 3D facial meshes of people performing a defined set of facial expressions, to learn. The ability to synthesise an entire facial rig from a single neutral expression has a large range of applications both in computer graphics and computer vision, ranging from the ecient and cost-e↵ective creation of CG characters to scalable data generation for machine learning purposes. Unlike previous methods based on multilinear models, the proposed approach is capable to extrapolate well outside the sample pool, which allows it to accurately reproduce the identity of the target subject and create artefact-free expression shapes while requiring only a small input dataset. We introduce global-local multilinear models that leverage the strengths of expression-specific and identity-specific local models combined with coarse motion estimations from a global model. The expression-specific and identity-specific local models are built from di↵erent slices of the patch-wise local multilinear model. Experimental results show that we achieve high-quality, identity-preserving facial expression synthesis results that outperform existing methods both quantitatively and qualitatively. In the second part of this thesis, we investigate how the modes of variations from visual data can be extracted. Our assumption is that visual data has an underlying structure consisting of factors of variation and their interactions. Finding this structure and the factors is important as it would not only help us to better understand visual data but once obtained we can edit the factors for use in various applications. Shape from Shading and expression transfer are just two of the potential applications. To extract the factors of variation, several supervised methods have been proposed but they require both labels regarding the modes of variations and the same number of samples under all modes of variations. Therefore, their applicability is limited to well-organised data, usually captured in well-controlled conditions. We propose a novel general multilinear matrix decomposition method that discovers the multilinear structure of possibly incomplete sets of visual data in unsupervised setting. We demonstrate the applicability of the proposed method in several computer vision tasks, including Shape from Shading (SfS) (in the wild and with occlusion removal), expression transfer, and estimation of surface normals from images captured in the wild. Finally, leveraging the unsupervised multilinear method proposed as well as recent advances in deep learning, we propose a weakly supervised deep learning method for disentangling multiple latent factors of variation in face images captured in-the-wild. To this end, we propose a deep latent variable model, where we model the multiplicative interactions of multiple latent factors of variation explicitly as a multilinear structure. We demonstrate that the proposed approach indeed learns disentangled representations of facial expressions and pose, which can be used in various applications, including face editing, as well as 3D face reconstruction and classification of facial expression, identity and pose.Open Acces

    Long-range concealed object detection through active covert illumination

    Get PDF
    © 2015 SPIE. When capturing a scene for surveillance, the addition of rich 3D data can dramatically improve the accuracy of object detection or face recognition. Traditional 3D techniques, such as geometric stereo, only provide a coarse grained reconstruction of the scene and are ill-suited to fine analysis. Photometric stereo is a well established technique providing dense, high-resolution, reconstructions, using active artificial illumination of an object from multiple directions to gather surface information. It is typically used indoors, at short range
    • …
    corecore