433,312 research outputs found

    Improving Identity-Robustness for Face Models

    Full text link
    Despite the success of deep-learning models in many tasks, there have been concerns about such models learning shortcuts, and their lack of robustness to irrelevant confounders. When it comes to models directly trained on human faces, a sensitive confounder is that of human identities. Many face-related tasks should ideally be identity-independent, and perform uniformly across different individuals (i.e. be fair). One way to measure and enforce such robustness and performance uniformity is through enforcing it during training, assuming identity-related information is available at scale. However, due to privacy concerns and also the cost of collecting such information, this is often not the case, and most face datasets simply contain input images and their corresponding task-related labels. Thus, improving identity-related robustness without the need for such annotations is of great importance. Here, we explore using face-recognition embedding vectors, as proxies for identities, to enforce such robustness. We propose to use the structure in the face-recognition embedding space, to implicitly emphasize rare samples within each class. We do so by weighting samples according to their conditional inverse density (CID) in the proxy embedding space. Our experiments suggest that such a simple sample weighting scheme, not only improves the training robustness, it often improves the overall performance as a result of such robustness. We also show that employing such constraints during training results in models that are significantly less sensitive to different levels of bias in the dataset

    Local feature extraction based facial emotion recognition: a survey

    Get PDF
    Notwithstanding the recent technological advancement, the identification of facial and emotional expressions is still one of the greatest challenges scientists have ever faced. Generally, the human face is identified as a composition made up of textures arranged in micro-patterns. Currently, there has been a tremendous increase in the use of local binary pattern based texture algorithms which have invariably been identified to being essential in the completion of a variety of tasks and in the extraction of essential attributes from an image. Over the years, lots of LBP variants have been literally reviewed. However, what is left is a thorough and comprehensive analysis of their independent performance. This research work aims at filling this gap by performing a large-scale performance evaluation of 46 recent state-of-the-art LBP variants for facial expression recognition. Extensive experimental results on the well-known challenging and benchmark KDEF, JAFFE, CK and MUG databases taken under different facial expression conditions, indicate that a number of evaluated state-of-the-art LBP-like methods achieve promising results, which are better or competitive than several recent state-of-the-art facial recognition systems. Recognition rates of 100%, 98.57%, 95.92% and 100% have been reached for CK, JAFFE, KDEF and MUG databases, respectively

    LDMNet: Low Dimensional Manifold Regularized Neural Networks

    Full text link
    Deep neural networks have proved very successful on archetypal tasks for which large training sets are available, but when the training data are scarce, their performance suffers from overfitting. Many existing methods of reducing overfitting are data-independent, and their efficacy is often limited when the training set is very small. Data-dependent regularizations are mostly motivated by the observation that data of interest lie close to a manifold, which is typically hard to parametrize explicitly and often requires human input of tangent vectors. These methods typically only focus on the geometry of the input data, and do not necessarily encourage the networks to produce geometrically meaningful features. To resolve this, we propose a new framework, the Low-Dimensional-Manifold-regularized neural Network (LDMNet), which incorporates a feature regularization method that focuses on the geometry of both the input data and the output features. In LDMNet, we regularize the network by encouraging the combination of the input data and the output features to sample a collection of low dimensional manifolds, which are searched efficiently without explicit parametrization. To achieve this, we directly use the manifold dimension as a regularization term in a variational functional. The resulting Euler-Lagrange equation is a Laplace-Beltrami equation over a point cloud, which is solved by the point integral method without increasing the computational complexity. We demonstrate two benefits of LDMNet in the experiments. First, we show that LDMNet significantly outperforms widely-used network regularizers such as weight decay and DropOut. Second, we show that LDMNet can be designed to extract common features of an object imaged via different modalities, which proves to be very useful in real-world applications such as cross-spectral face recognition

    Effects of lighting on the perception of facial surfaces

    Get PDF
    The problem of variable illumination for object constancy has been largely neglected by "edge-based" theories of object recognition. However, there is evidence that edge-based schemes may not be sufficient for face processing and that shading information may be necessary (Bruce. 1988). Changes in lighting affect the pattern of shading on any three-dimensional object and the aim of this thesis was to investigate the effects of lighting on tasks involving face perception. Effects of lighting are first reported on the perception of the hollow face illusion (Gregory, 1973). The impression of a convex face was found to be stronger when light appeared to be from above, consistent with the importance of shape-from- shading which is thought to incorporate a light-from-above assumption. There was an independent main effect of orientation with the illusion stronger when the face was upright. This confirmed that object knowledge was important in generating the illusion, a conclusion which was confirmed by comparison with a "hollow potato" illusion. There was an effect of light on the inverted face suggesting that the direction of light may generally affect the interpretation of surfaces as convex or concave. It was also argued that there appears to be a general preference for convex interpretations of patterns of shading. The illusion was also found to be stronger when viewed monocularly and this effect was also independent of orientation. This was consistent with the processing of shape information by independent modules with object knowledge acting as a further constraint on the final interpretation. Effects of lighting were next reported on the recognition of shaded representations of facial surfaces, with top lighting facilitating processing. The adverse effects of bottom lighting on the interpretation of facial shape appear to affect within category as well as between category discriminations. Photographic negation was also found to affect recognition performance and it was suggested that its effects may be complimentary to those of bottom lighting in some respects. These effects were reported to be dependent on view. The last set of experiments investigated the effects of lighting and view on a simultaneous face matching task using the same surface representations which required subjects to decide if two images were of the same or different people. Subjects were found to be as much affected by a change in lighting as a change in view, which seems inconsistent with edge-based accounts. Top lighting was also found to facilitate matches across changes in view. When the stimuli were inverted matches across changes in both view and light were poorer, although image differences were the same. In other experiments subjects were found to match better across changes between two directions of top lighting than between directions of bottom lighting, although the extent of the changes were the same, suggesting the importance of top lighting for lighting as well as view invariance. Inverting the stimuli, which also inverts the lighting relative to the observer, disrupted matching across directions of top lighting but facilitated matching between levels of bottom lighting, consistent with the use of shading information. Changes in size were not found to affect matching showing that the effect of lighting was not only because it changes image properties. The effect of lighting was also found to transfer to digitised photographs showing that it was not an artifact of the materials. Lastly effects of lighting were reported when images were presented sequentially showing that the effect was not an artifact of simultaneous presentation. In the final section the effects reported were considered within the framework of theories of object recognition and argued to be inconsistent with invariant features, edge-based or alignment approaches. An alternative scheme employing surface-based primitives derived from shape-from-shuding was developed to account for the pattern of effects and contrasted with an image-based accoun

    New Tests to Measure Individual Differences in Matching and Labelling Facial Expressions of Emotion, and Their Association with Ability to Recognise Vocal Emotions and Facial Identity

    Get PDF
    Although good tests are available for diagnosing clinical impairments in face expression processing, there is a lack of strong tests for assessing "individual differences"--that is, differences in ability between individuals within the typical, nonclinical, range. Here, we develop two new tests, one for expression perception (an odd-man-out matching task in which participants select which one of three faces displays a different expression) and one additionally requiring explicit identification of the emotion (a labelling task in which participants select one of six verbal labels). We demonstrate validity (careful check of individual items, large inversion effects, independence from nonverbal IQ, convergent validity with a previous labelling task), reliability (Cronbach's alphas of.77 and.76 respectively), and wide individual differences across the typical population. We then demonstrate the usefulness of the tests by addressing theoretical questions regarding the structure of face processing, specifically the extent to which the following processes are common or distinct: (a) perceptual matching and explicit labelling of expression (modest correlation between matching and labelling supported partial independence); (b) judgement of expressions from faces and voices (results argued labelling tasks tap into a multi-modal system, while matching tasks tap distinct perceptual processes); and (c) expression and identity processing (results argued for a common first step of perceptual processing for expression and identity).This research was supported by the Australian Research Council (http://www.arc.gov.au/) grant DP110100850 to RP and EM and the Australian Research Council Centre of Excellence for Cognition and its Disorders (CE110001021) http://www.ccd.edu.au. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript

    Spatio-Temporal Facial Expression Recognition Using Convolutional Neural Networks and Conditional Random Fields

    Full text link
    Automated Facial Expression Recognition (FER) has been a challenging task for decades. Many of the existing works use hand-crafted features such as LBP, HOG, LPQ, and Histogram of Optical Flow (HOF) combined with classifiers such as Support Vector Machines for expression recognition. These methods often require rigorous hyperparameter tuning to achieve good results. Recently Deep Neural Networks (DNN) have shown to outperform traditional methods in visual object recognition. In this paper, we propose a two-part network consisting of a DNN-based architecture followed by a Conditional Random Field (CRF) module for facial expression recognition in videos. The first part captures the spatial relation within facial images using convolutional layers followed by three Inception-ResNet modules and two fully-connected layers. To capture the temporal relation between the image frames, we use linear chain CRF in the second part of our network. We evaluate our proposed network on three publicly available databases, viz. CK+, MMI, and FERA. Experiments are performed in subject-independent and cross-database manners. Our experimental results show that cascading the deep network architecture with the CRF module considerably increases the recognition of facial expressions in videos and in particular it outperforms the state-of-the-art methods in the cross-database experiments and yields comparable results in the subject-independent experiments.Comment: To appear in 12th IEEE Conference on Automatic Face and Gesture Recognition Worksho

    Exploring face perception in disorders of development: evidence from Williams syndrome and autism

    Get PDF
    Individuals with Williams syndrome (WS) and autism are characterized by different social phenotypes but have been said to show similar atypicalities of face-processing style. Although the structural encoding of faces may be similarly atypical in these two developmental disorders, there are clear differences in overall face skills. The inclusion of both populations in the same study can address how the profile of face skills varies across disorders. The current paper explored the processing of identity, eye-gaze, lip-reading, and expressions of emotion using the same participants across face domains. The tasks had previously been used to make claims of a modular structure to face perception in typical development. Participants with WS (N=15) and autism (N=20) could be dissociated from each other, and from individuals with general developmental delay, in the domains of eye-gaze and expression processing. Individuals with WS were stronger at these skills than individuals with autism. Even if the structural encoding of faces appears similarly atypical in these groups, the overall profile of face skills, as well as the underlying architecture of face perception, varies greatly. The research provides insights into typical and atypical models of face perception in WS and autism

    MOON: A Mixed Objective Optimization Network for the Recognition of Facial Attributes

    Full text link
    Attribute recognition, particularly facial, extracts many labels for each image. While some multi-task vision problems can be decomposed into separate tasks and stages, e.g., training independent models for each task, for a growing set of problems joint optimization across all tasks has been shown to improve performance. We show that for deep convolutional neural network (DCNN) facial attribute extraction, multi-task optimization is better. Unfortunately, it can be difficult to apply joint optimization to DCNNs when training data is imbalanced, and re-balancing multi-label data directly is structurally infeasible, since adding/removing data to balance one label will change the sampling of the other labels. This paper addresses the multi-label imbalance problem by introducing a novel mixed objective optimization network (MOON) with a loss function that mixes multiple task objectives with domain adaptive re-weighting of propagated loss. Experiments demonstrate that not only does MOON advance the state of the art in facial attribute recognition, but it also outperforms independently trained DCNNs using the same data. When using facial attributes for the LFW face recognition task, we show that our balanced (domain adapted) network outperforms the unbalanced trained network.Comment: Post-print of manuscript accepted to the European Conference on Computer Vision (ECCV) 2016 http://link.springer.com/chapter/10.1007%2F978-3-319-46454-1_
    • …
    corecore