4 research outputs found

    Feature fusions for 2.5D face recognition in Random Maxout Extreme Learning Machine

    No full text
    Contemporary face recognition system is often based on either 2D (texture) or 3D (texture + shape) face modality. An alternative modality that utilizes range (depth) facial images, namely 2.5D face recognition emerges. In this paper, we propose a 2.5D face descriptor that based on the Regional Covariance Matrix (RCM), a powerful means of feature fusion technique and a novel classifier dubbed Random Maxout Extreme Learning Machine (RMELM). The RCM of interest is constructed based on the Principal Component Analysis (PCA) filters responses of facial texture and/or range image, wherein the PCA filters are learned from a two-layer PCA network. The RMELM is an ELM variant where the activation function is based on the locally linear maxout function, in place of typical global non-linear functions in ELM. Since the RCM is a special case of symmetric positive definite matrix that resides on the Tensor manifold; a gap exists in between RCM and RMELM, which is a vector-based classifier. To bridge the gap, we flatten the manifold by transforming the RCM to a feature vector via a matrix logarithm operator. Experimental results from two public 3D face databases, FRGC v2.0 database and Gavab database, validated our proposed method is promising in 2.5D face recognition

    Exploring the representation of caricatures, facial motion, and view-invariance in face space.

    Get PDF
    Faces present a vast array of information, from invariable features such as identity, to variable features such as expression, speech and pose. Humans have an incredible capability of recognising faces (familiar faces at least) and interpreting facial actions, even across changes in view. While there has been an explosion of research into developing artificial neural networks for many aspects of face processing, some of which seem to predict neural responses quite well, the current work focuses on face processing through simpler linear projection spaces. These linear projection spaces are formal instantiations of ‘face space’, built using principal component analysis (PCA). The concept of ‘face space’ (Valentine, 1991) has been a highly influential account of how faces might be represented in the brain. In particular, recent research supports the presence of a face space in the macaque brain in the form of a linear projection space, referred to as ‘axis coding’ in which individual faces can be coded as linear sum of orthogonal features. Here, these linear projection spaces are used for two streams of investigation. Firstly, we assessed the neurovascular response to hyper-caricatured faces in an fMRI study. Based on the assumption that faces further from average should project more strongly onto components in the linear space, we hypothesised that they should elicit a stronger response. Contrary to our expectations, we found little evidence for this in the fusiform face area (FFA) and face-selective cortex more generally, although the response pattern did become more consistent for caricatured faces in the FFA. We then explored the response to these caricatured faces in cortex typically associated with object processing. Interestingly, both the average response magnitude and response pattern consistency increased to these stimuli as caricaturing increased. At the current time it is unclear if this response allows some functional benefit for processing caricatured faces, or whether it simply reflects similarities in the low- and mid-level properties to certain objects. If the response is functional, then hyper-caricaturing could pave a route to improving face processing in individuals with prosopagnosia if technologies can be developed to automatically caricature faces in real-time. The second line of work addressed these linear projection spaces in the context of achieving view-invariance, specifically in the domain of facial motion and expression. How humans create view-invariant representations is still of interest, despite much research, however little work has focused on creating view-invariant representations outside of identity recognition. Likewise, there has been much research into face space and view-invariance separately, yet there is little evidence for how different views may be represented within a face space framework, and how motion might also be incorporated. Automatic face analysis systems mostly deal with pose by either aligning to a canonical frontal view or by using separate view-specific models. There is inconclusive evidence that the brain possesses an internal 3D model for ‘frontalising’ faces, therefore here we investigate how changes in view might be processed in a unified multi-view face space based on using a few prototypical 2D views. We investigate the functionality and biological plausibility of five identity-specific faces spaces, created using PCA, that allow for different views to be reconstructed from single-view video inputs of actors speaking. The most promising of these models first builds a separate orthogonal space for each viewpoint. The relationships between the components in neighbouring views are learned, and then reconstructions across views are made using a cascade of projection, transformation, and reconstruction. These reconstructions are then collated and used to build a multi-view space, which can reconstruct motion well across all learned views. This provides initial insight into how a biologically plausible, view-invariant system for facial motion processing might be represented in the brain. Moreover, it also has the capacity to improve view-transformations in automatic lip-reading software

    Exploring the representation of caricatures, facial motion, and view-invariance in face space.

    Get PDF
    Faces present a vast array of information, from invariable features such as identity, to variable features such as expression, speech and pose. Humans have an incredible capability of recognising faces (familiar faces at least) and interpreting facial actions, even across changes in view. While there has been an explosion of research into developing artificial neural networks for many aspects of face processing, some of which seem to predict neural responses quite well, the current work focuses on face processing through simpler linear projection spaces. These linear projection spaces are formal instantiations of ‘face space’, built using principal component analysis (PCA). The concept of ‘face space’ (Valentine, 1991) has been a highly influential account of how faces might be represented in the brain. In particular, recent research supports the presence of a face space in the macaque brain in the form of a linear projection space, referred to as ‘axis coding’ in which individual faces can be coded as linear sum of orthogonal features. Here, these linear projection spaces are used for two streams of investigation. Firstly, we assessed the neurovascular response to hyper-caricatured faces in an fMRI study. Based on the assumption that faces further from average should project more strongly onto components in the linear space, we hypothesised that they should elicit a stronger response. Contrary to our expectations, we found little evidence for this in the fusiform face area (FFA) and face-selective cortex more generally, although the response pattern did become more consistent for caricatured faces in the FFA. We then explored the response to these caricatured faces in cortex typically associated with object processing. Interestingly, both the average response magnitude and response pattern consistency increased to these stimuli as caricaturing increased. At the current time it is unclear if this response allows some functional benefit for processing caricatured faces, or whether it simply reflects similarities in the low- and mid-level properties to certain objects. If the response is functional, then hyper-caricaturing could pave a route to improving face processing in individuals with prosopagnosia if technologies can be developed to automatically caricature faces in real-time. The second line of work addressed these linear projection spaces in the context of achieving view-invariance, specifically in the domain of facial motion and expression. How humans create view-invariant representations is still of interest, despite much research, however little work has focused on creating view-invariant representations outside of identity recognition. Likewise, there has been much research into face space and view-invariance separately, yet there is little evidence for how different views may be represented within a face space framework, and how motion might also be incorporated. Automatic face analysis systems mostly deal with pose by either aligning to a canonical frontal view or by using separate view-specific models. There is inconclusive evidence that the brain possesses an internal 3D model for ‘frontalising’ faces, therefore here we investigate how changes in view might be processed in a unified multi-view face space based on using a few prototypical 2D views. We investigate the functionality and biological plausibility of five identity-specific faces spaces, created using PCA, that allow for different views to be reconstructed from single-view video inputs of actors speaking. The most promising of these models first builds a separate orthogonal space for each viewpoint. The relationships between the components in neighbouring views are learned, and then reconstructions across views are made using a cascade of projection, transformation, and reconstruction. These reconstructions are then collated and used to build a multi-view space, which can reconstruct motion well across all learned views. This provides initial insight into how a biologically plausible, view-invariant system for facial motion processing might be represented in the brain. Moreover, it also has the capacity to improve view-transformations in automatic lip-reading software
    corecore