433,312 research outputs found
Improving Identity-Robustness for Face Models
Despite the success of deep-learning models in many tasks, there have been
concerns about such models learning shortcuts, and their lack of robustness to
irrelevant confounders. When it comes to models directly trained on human
faces, a sensitive confounder is that of human identities. Many face-related
tasks should ideally be identity-independent, and perform uniformly across
different individuals (i.e. be fair). One way to measure and enforce such
robustness and performance uniformity is through enforcing it during training,
assuming identity-related information is available at scale. However, due to
privacy concerns and also the cost of collecting such information, this is
often not the case, and most face datasets simply contain input images and
their corresponding task-related labels. Thus, improving identity-related
robustness without the need for such annotations is of great importance. Here,
we explore using face-recognition embedding vectors, as proxies for identities,
to enforce such robustness. We propose to use the structure in the
face-recognition embedding space, to implicitly emphasize rare samples within
each class. We do so by weighting samples according to their conditional
inverse density (CID) in the proxy embedding space. Our experiments suggest
that such a simple sample weighting scheme, not only improves the training
robustness, it often improves the overall performance as a result of such
robustness. We also show that employing such constraints during training
results in models that are significantly less sensitive to different levels of
bias in the dataset
Local feature extraction based facial emotion recognition: a survey
Notwithstanding the recent technological advancement, the identification of facial and emotional expressions is still one of the greatest challenges scientists have ever faced. Generally, the human face is identified as a composition made up of textures arranged in micro-patterns. Currently, there has been a tremendous increase in the use of local binary pattern based texture algorithms which have invariably been identified to being essential in the completion of a variety of tasks and in the extraction of essential attributes from an image. Over the years, lots of LBP variants have been literally reviewed. However, what is left is a thorough and comprehensive analysis of their independent performance. This research work aims at filling this gap by performing a large-scale performance evaluation of 46 recent state-of-the-art LBP variants for facial expression recognition. Extensive experimental results on the well-known challenging and benchmark KDEF, JAFFE, CK and MUG databases taken under different facial expression conditions, indicate that a number of evaluated state-of-the-art LBP-like methods achieve promising results, which are better or competitive than several recent state-of-the-art facial recognition systems. Recognition rates of 100%, 98.57%, 95.92% and 100% have been reached for CK, JAFFE, KDEF and MUG databases, respectively
LDMNet: Low Dimensional Manifold Regularized Neural Networks
Deep neural networks have proved very successful on archetypal tasks for
which large training sets are available, but when the training data are scarce,
their performance suffers from overfitting. Many existing methods of reducing
overfitting are data-independent, and their efficacy is often limited when the
training set is very small. Data-dependent regularizations are mostly motivated
by the observation that data of interest lie close to a manifold, which is
typically hard to parametrize explicitly and often requires human input of
tangent vectors. These methods typically only focus on the geometry of the
input data, and do not necessarily encourage the networks to produce
geometrically meaningful features. To resolve this, we propose a new framework,
the Low-Dimensional-Manifold-regularized neural Network (LDMNet), which
incorporates a feature regularization method that focuses on the geometry of
both the input data and the output features. In LDMNet, we regularize the
network by encouraging the combination of the input data and the output
features to sample a collection of low dimensional manifolds, which are
searched efficiently without explicit parametrization. To achieve this, we
directly use the manifold dimension as a regularization term in a variational
functional. The resulting Euler-Lagrange equation is a Laplace-Beltrami
equation over a point cloud, which is solved by the point integral method
without increasing the computational complexity. We demonstrate two benefits of
LDMNet in the experiments. First, we show that LDMNet significantly outperforms
widely-used network regularizers such as weight decay and DropOut. Second, we
show that LDMNet can be designed to extract common features of an object imaged
via different modalities, which proves to be very useful in real-world
applications such as cross-spectral face recognition
Effects of lighting on the perception of facial surfaces
The problem of variable illumination for object constancy has been largely neglected by "edge-based" theories of object recognition. However, there is evidence that edge-based schemes may not be sufficient for face processing and that shading information may be necessary (Bruce. 1988). Changes in lighting affect the pattern of shading on any three-dimensional object and the aim of this thesis was to investigate the effects of lighting on tasks involving face perception.
Effects of lighting are first reported on the perception of the hollow face illusion (Gregory, 1973). The impression of a convex face was found to be stronger when light appeared to be from above, consistent with the importance of shape-from- shading which is thought to incorporate a light-from-above assumption. There was an independent main effect of orientation with the illusion stronger when the face was upright. This confirmed that object knowledge was important in generating the illusion, a conclusion which was confirmed by comparison with a "hollow potato" illusion. There was an effect of light on the inverted face suggesting that the direction of light may generally affect the interpretation of surfaces as convex or concave. It was also argued that there appears to be a general preference for convex interpretations of patterns of shading. The illusion was also found to be stronger when viewed monocularly and this effect was also independent of orientation. This was consistent with the processing of shape information by independent modules with object knowledge acting as a further constraint on the final interpretation.
Effects of lighting were next reported on the recognition of shaded representations of facial surfaces, with top lighting facilitating processing. The adverse effects of bottom lighting on the interpretation of facial shape appear to affect within category as well as between category discriminations. Photographic negation was also found to affect recognition performance and it was suggested that its effects may be complimentary to those of bottom lighting in some respects. These effects were reported to be dependent on view.
The last set of experiments investigated the effects of lighting and view on a simultaneous face matching task using the same surface representations which required subjects to decide if two images were of the same or different people. Subjects were found to be as much affected by a change in lighting as a change in view, which seems inconsistent with edge-based accounts. Top lighting was also found to facilitate matches across changes in view. When the stimuli were inverted matches across changes in both view and light were poorer, although image differences were the same. In other experiments subjects were found to match better across changes between two directions of top lighting than between directions of bottom lighting, although the extent of the changes were the same, suggesting the importance of top lighting for lighting as well as view invariance. Inverting the stimuli, which also inverts the lighting relative to the observer, disrupted matching across directions of top lighting but facilitated matching between levels of bottom lighting, consistent with the use of shading information. Changes in size were not found to affect matching showing that the effect of lighting was not only because it changes image properties. The effect of lighting was also found to transfer to digitised photographs showing that it was not an artifact of the materials. Lastly effects of lighting were reported when images were presented sequentially showing that the effect was not an artifact of simultaneous presentation.
In the final section the effects reported were considered within the framework of theories of object recognition and argued to be inconsistent with invariant features, edge-based or alignment approaches. An alternative scheme employing surface-based primitives derived from shape-from-shuding was developed to account for the pattern of effects and contrasted with an image-based accoun
New Tests to Measure Individual Differences in Matching and Labelling Facial Expressions of Emotion, and Their Association with Ability to Recognise Vocal Emotions and Facial Identity
Although good tests are available for diagnosing clinical impairments in face expression processing, there is a lack of strong tests for assessing "individual differences"--that is, differences in ability between individuals within the typical, nonclinical, range. Here, we develop two new tests, one for expression perception (an odd-man-out matching task in which participants select which one of three faces displays a different expression) and one additionally requiring explicit identification of the emotion (a labelling task in which participants select one of six verbal labels). We demonstrate validity (careful check of individual items, large inversion effects, independence from nonverbal IQ, convergent validity with a previous labelling task), reliability (Cronbach's alphas of.77 and.76 respectively), and wide individual differences across the typical population. We then demonstrate the usefulness of the tests by addressing theoretical questions regarding the structure of face processing, specifically the extent to which the following processes are common or distinct: (a) perceptual matching and explicit labelling of expression (modest correlation between matching and labelling supported partial independence); (b) judgement of expressions from faces and voices (results argued labelling tasks tap into a multi-modal system, while matching tasks tap distinct perceptual processes); and (c) expression and identity processing (results argued for a common first step of perceptual processing for expression and identity).This research was supported by the Australian Research Council (http://www.arc.gov.au/) grant DP110100850 to RP and EM and the Australian
Research Council Centre of Excellence for Cognition and its Disorders (CE110001021) http://www.ccd.edu.au. The funders had no role in study design, data
collection and analysis, decision to publish, or preparation of the manuscript
Spatio-Temporal Facial Expression Recognition Using Convolutional Neural Networks and Conditional Random Fields
Automated Facial Expression Recognition (FER) has been a challenging task for
decades. Many of the existing works use hand-crafted features such as LBP, HOG,
LPQ, and Histogram of Optical Flow (HOF) combined with classifiers such as
Support Vector Machines for expression recognition. These methods often require
rigorous hyperparameter tuning to achieve good results. Recently Deep Neural
Networks (DNN) have shown to outperform traditional methods in visual object
recognition. In this paper, we propose a two-part network consisting of a
DNN-based architecture followed by a Conditional Random Field (CRF) module for
facial expression recognition in videos. The first part captures the spatial
relation within facial images using convolutional layers followed by three
Inception-ResNet modules and two fully-connected layers. To capture the
temporal relation between the image frames, we use linear chain CRF in the
second part of our network. We evaluate our proposed network on three publicly
available databases, viz. CK+, MMI, and FERA. Experiments are performed in
subject-independent and cross-database manners. Our experimental results show
that cascading the deep network architecture with the CRF module considerably
increases the recognition of facial expressions in videos and in particular it
outperforms the state-of-the-art methods in the cross-database experiments and
yields comparable results in the subject-independent experiments.Comment: To appear in 12th IEEE Conference on Automatic Face and Gesture
Recognition Worksho
Exploring face perception in disorders of development: evidence from Williams syndrome and autism
Individuals with Williams syndrome (WS) and autism are characterized by different social phenotypes but have been said to show similar atypicalities of face-processing style. Although the structural encoding of faces may be similarly atypical in these two developmental disorders, there are clear differences in overall face skills. The inclusion of both populations in the same study can address how the profile of face skills varies across disorders. The current paper explored the processing of identity, eye-gaze, lip-reading, and expressions of emotion using the same participants across face domains. The tasks had previously been used to make claims of a modular structure to face perception in typical development. Participants with WS (N=15) and autism (N=20) could be dissociated from each other, and from individuals with general developmental delay, in the domains of eye-gaze and expression processing. Individuals with WS were stronger at these skills than individuals with autism. Even if the structural encoding of faces appears similarly atypical in these groups, the overall profile of face skills, as well as the underlying architecture of face perception, varies greatly. The research provides insights into typical and atypical models of face perception in WS and autism
MOON: A Mixed Objective Optimization Network for the Recognition of Facial Attributes
Attribute recognition, particularly facial, extracts many labels for each
image. While some multi-task vision problems can be decomposed into separate
tasks and stages, e.g., training independent models for each task, for a
growing set of problems joint optimization across all tasks has been shown to
improve performance. We show that for deep convolutional neural network (DCNN)
facial attribute extraction, multi-task optimization is better. Unfortunately,
it can be difficult to apply joint optimization to DCNNs when training data is
imbalanced, and re-balancing multi-label data directly is structurally
infeasible, since adding/removing data to balance one label will change the
sampling of the other labels. This paper addresses the multi-label imbalance
problem by introducing a novel mixed objective optimization network (MOON) with
a loss function that mixes multiple task objectives with domain adaptive
re-weighting of propagated loss. Experiments demonstrate that not only does
MOON advance the state of the art in facial attribute recognition, but it also
outperforms independently trained DCNNs using the same data. When using facial
attributes for the LFW face recognition task, we show that our balanced (domain
adapted) network outperforms the unbalanced trained network.Comment: Post-print of manuscript accepted to the European Conference on
Computer Vision (ECCV) 2016
http://link.springer.com/chapter/10.1007%2F978-3-319-46454-1_
- …