159 research outputs found

    A Brain-Inspired Multi-Modal Perceptual System for Social Robots: An Experimental Realization

    Get PDF
    We propose a multi-modal perceptual system that is inspired by the inner working of the human brain; in particular, the hierarchical structure of the sensory cortex and the spatial-temporal binding criteria. The system is context independent and can be applied to many on-going problems in social robotics, including but not limited to person recognition, emotion recognition, and multi-modal robot doctor to name a few. The system encapsulates the parallel distributed processing of real-world stimuli through different sensor modalities and encoding them into features vectors which in turn are processed via a number of dedicated processing units (DPUs) through hierarchical paths. DPUs are algorithmic realizations of the cell assemblies in neuroscience. A plausible and realistic perceptual system is presented via the integration of the outputs from these units by spiking neural networks. We will also discuss other components of the system including top-down influences and the integration of information through temporal binding with fading memory and suggest two alternatives to realize these criteria. Finally, we will demonstrate the implementation of this architecture on a hardware platform as a social robot and report experimental studies on the system

    FACIAL FORM AS A SUBCLINICAL PHENOTYPE OF NONSYNDROMIC OROFACIAL CLEFTING: AN ANTHROPOMETRIC ANALYSIS

    Get PDF
    Orofacial clefting (OFC) is the most common craniofacial anomaly, seen in every 1 in 500 to 2500 births worldwide. It has been identified that 60 to 70% of OFC are non-syndromic (NS) and are not associated with any single genetic marker. However, high recurrence rates of NSOFC have been identified in families. The recurrence risk is predicted on rather empirical data owing to poor gene mapping and poor correlation between genotype and phenotype of this anomaly. Considering the fact that OFC presents with significant etiologic heterogeneity and phenotypic diversity, subclinical manifestations need to be identified to complete the OFC phenotypic spectrum. This will improve correlation between genotype and phenotype and thus improve recurrence risk estimation. A large body of evidence suggests that subtle changes in craniofacial morphology may be a subclinical marker for cleft susceptibility. A vast majority of this evidence is based on cephalometric data with far fewer studies examining soft tissue features of the face. The purpose of the present study is to compare craniofacial characteristics of unaffected biological parents of NS OFC offspring with controls derived from the same population using direct anthropometry. The study sample consisted of 67 male and 76 female unaffected parents of both NS Cleft lip and Cleft lip/palate children. Control sample comprised of 37 normal males and 59 normal females of the same race and ethnicity. Craniofacial measurements of both study and control population were collected using direct anthropometry as was described by Farkas (1994) and Kolar & Salter (1997) and were subjected to stepwise discriminant functional analysis (DFA). DFA is similar to logistic regression; used to classify population into groups based on covariate variables. In this study discriminant models with high statistical significance (P‹0.001) were derived in males and females that could clearly distinguish unaffected parents from controls based on direct anthropometrically measured craniofacial characteristics. The study showed that salient discriminating features are localized to specific regions of the face in a partly gender-specific manner. The study showed that a model derived using a small subset of direct anthropometrically measured craniofacial features can be used to discriminate unaffected parents from the controls

    Geometric Expression Invariant 3D Face Recognition using Statistical Discriminant Models

    No full text
    Currently there is no complete face recognition system that is invariant to all facial expressions. Although humans find it easy to identify and recognise faces regardless of changes in illumination, pose and expression, producing a computer system with a similar capability has proved to be particularly di cult. Three dimensional face models are geometric in nature and therefore have the advantage of being invariant to head pose and lighting. However they are still susceptible to facial expressions. This can be seen in the decrease in the recognition results using principal component analysis when expressions are added to a data set. In order to achieve expression-invariant face recognition systems, we have employed a tensor algebra framework to represent 3D face data with facial expressions in a parsimonious space. Face variation factors are organised in particular subject and facial expression modes. We manipulate this using single value decomposition on sub-tensors representing one variation mode. This framework possesses the ability to deal with the shortcomings of PCA in less constrained environments and still preserves the integrity of the 3D data. The results show improved recognition rates for faces and facial expressions, even recognising high intensity expressions that are not in the training datasets. We have determined, experimentally, a set of anatomical landmarks that best describe facial expression e ectively. We found that the best placement of landmarks to distinguish di erent facial expressions are in areas around the prominent features, such as the cheeks and eyebrows. Recognition results using landmark-based face recognition could be improved with better placement. We looked into the possibility of achieving expression-invariant face recognition by reconstructing and manipulating realistic facial expressions. We proposed a tensor-based statistical discriminant analysis method to reconstruct facial expressions and in particular to neutralise facial expressions. The results of the synthesised facial expressions are visually more realistic than facial expressions generated using conventional active shape modelling (ASM). We then used reconstructed neutral faces in the sub-tensor framework for recognition purposes. The recognition results showed slight improvement. Besides biometric recognition, this novel tensor-based synthesis approach could be used in computer games and real-time animation applications

    Robust arbitrary view gait recognition based on parametric 3D human body reconstruction and virtual posture synthesis

    Get PDF
    This paper proposes an arbitrary view gait recognition method where the gait recognition is performed in 3-dimensional (3D) to be robust to variation in speed, inclined plane and clothing, and in the presence of a carried item. 3D parametric gait models in a gait period are reconstructed by an optimized 3D human pose, shape and simulated clothes estimation method using multiview gait silhouettes. The gait estimation involves morphing a new subject with constant semantic constraints using silhouette cost function as observations. Using a clothes-independent 3D parametric gait model reconstruction method, gait models of different subjects with various postures in a cycle are obtained and used as galleries to construct 3D gait dictionary. Using a carrying-items posture synthesized model, virtual gait models with different carrying-items postures are synthesized to further construct an over-complete 3D gait dictionary. A self-occlusion optimized simultaneous sparse representation model is also introduced to achieve high robustness in limited gait frames. Experimental analyses on CASIA B dataset and CMU MoBo dataset show a significant performance gain in terms of accuracy and robustness

    Description-based visualisation of ethnic facial types

    Get PDF
    This study reports on the design and evaluation of a tool to assist in the description and visualisation of the human face and variations in facial shape and proportions characteristic of different ethnicities. A comprehensive set of local shape features (sulci, folds, prominences, slopes, fossae, etc.) which constitute a visually-discernible ‘vocabulary’ for facial description. Each such feature has one or more continuous-valued attributes, some of which are dimensional and correspond directly to conventional anthropometric distance measurements between facial landmarks, while other attributes capture the shape or topography of that given feature. These attributes, distributed over six facial regions (eyes, nose, etc.), control a morphable model of facial shape that can approximate individual faces as well as the averaged faces of various ethnotypes. Clues to ethnic origin are often more effectively conveyed by shape attributes than through differences in anthropometric measurements due to large individual differences in facial dimensions within each ethnicity. Individual faces of representative ethnicities (European, East Asian, etc.) can then be modelled to establish the range of variation of the attributes (each represented by a corresponding three-dimensional ‘basis shape’). These attributes are designed to be quasi-orthogonal, in that the model can assume attribute values in arbitrary combination with minimal undesired interaction. They thus can serve as the basis of a set of dimensions or degrees of freedom. The space of variation in facial shape defines an ethnicity face space (EFS), suitable for the human appreciation of facial variation across ethnicities, in contrast to a conventional identity face space (IFS) intended for automated detection of individual faces out of a sample set of faces from a single, homogeneous population. The dimensions comprising an IFS are based on holistic measurements and are usually not interpretable in terms of local facial dimensions or shape (i.e., they are not ‘semantic’). In contrast, for an EFS to facilitate our understanding of ethnic variation across faces (as opposed to ethnicity recognition) the underlying dimensions should correspond to visibly-discernible attributes. A shift from quantitative landmark-based anthropometric comparisons to local shape comparisons is demonstrated. Ethnic variation can be visually appreciated by observing the changes in a model through animation. These changes can be tracked at different levels of complexity: across the whole face, by selected facial region, by isolated feature, and by isolated attribute of a given feature. This study demonstrates that an intuitive feature set, derived by artistically-informed visual observation, can provide a workable descriptive basis. While neither mathematically-complete nor strictly orthogonal, the feature space permits close surface fits between the morphable model and face scan data. This study is intended for the human visual appreciation of facial shape, the characteristics of differing ethnicities, and the quantification of those differences. It presumes a basic understanding of the standard practices in digital facial animation

    A PCA approach to the object constancy for faces using view-based models of the face

    Get PDF
    The analysis of object and face recognition by humans attracts a great deal of interest, mainly because of its many applications in various fields, including psychology, security, computer technology, medicine and computer graphics. The aim of this work is to investigate whether a PCA-based mapping approach can offer a new perspective on models of object constancy for faces in human vision. An existing system for facial motion capture and animation developed for performance-driven animation of avatars is adapted, improved and repurposed to study face representation in the context of viewpoint and lighting invariance. The main goal of the thesis is to develop and evaluate a new approach to viewpoint invariance that is view-based and allows mapping of facial variation between different views to construct a multi-view representation of the face. The thesis describes a computer implementation of a model that uses PCA to generate example- based models of the face. The work explores the joint encoding of expression and viewpoint using PCA and the mapping between viewspecific PCA spaces. The simultaneous, synchronised video recording of 6 views of the face was used to construct multi-view representations, which helped to investigate how well multiple views could be recovered from a single view via the content addressable memory property of PCA. A similar approach was taken to lighting invariance. Finally, the possibility of constructing a multi-view representation from asynchronous view-based data was explored. The results of this thesis have implications for a continuing research problem in computer vision – the problem of recognising faces and objects from different perspectives and in different lighting. It also provides a new approach to understanding viewpoint invariance and lighting invariance in human observers

    Generating anatomical substructures for physically-based facial animation.

    Get PDF
    Physically-based facial animation techniques are capable of producing realistic facial deformations, but have failed to find meaningful use outside the academic community because they are notoriously difficult to create, reuse, and art-direct, in comparison to other methods of facial animation. This thesis addresses these shortcomings and presents a series of methods for automatically generating a skull, the superficial musculoaponeurotic system (SMAS – a layer of fascia investing and interlinking the mimic muscle system), and mimic muscles for any given 3D face model. This is done toward (the goal of) a production-viable framework or rig-builder for physically-based facial animation. This workflow consists of three major steps. First, a generic skull is fitted to a given head model using thin-plate splines computed from the correspondence between landmarks placed on both models. Second, the SMAS is constructed as a variational implicit or radial basis function surface in the interface between the head model and the generic skull fitted to it. Lastly, muscle fibres are generated as boundary-value straightest geodesics, connecting muscle attachment regions defined on the surface of the SMAS. Each step of this workflow is developed with speed, realism and reusability in mind

    Less than meets the eye: the diagnostic information for visual categorization

    Get PDF
    Current theories of visual categorization are cast in terms of information processing mechanisms that use mental representations. However, the actual information contents of these representations are rarely characterized, which in turn hinders knowledge of mechanisms that use them. In this thesis, I identified these contents by extracting the information that supports behavior under given tasks - i.e., the task-specific diagnostic information. In the first study (Chapter 2), I modelled the diagnostic face information for familiar face identification, using a unique generative model of face identity information combined with perceptual judgments and reverse correlation. I then demonstrated the validity of this information using everyday perceptual tasks that generalize face identity and resemblance judgments to new viewpoints, age, and sex with a new group of participants. My results showed that human participants represent only a proportion of the objective identity information available, but what they do represent is both sufficiently detailed and versatile to generalize face identification across diverse tasks successfully. In the second study (Chapter 3), I modelled the diagnostic facial movement for facial expressions of emotion recognition. I used the models that characterize the mental representations of six facial expressions of emotion (Happy, Surprise, Fear, Anger, Disgust, and Sad) in individual observers. I validated them on a new group of participants. With the validated models, I derived main signal variants for each emotion and their probabilities of occurrence within each emotion. Using these variants and their probability, I trained a Bayesian classifier and showed that the Bayesian classifier mimics human observers’ categorization performance closely. My results demonstrated that such emotion variants and their probabilities of occurrence comprise observers’ mental representations of facial expressions of emotion. In the third study (Chapter 4), I investigated how the brain reduces high dimensional visual input into low dimensional diagnostic representations to support a scene categorization. To do so, I used an information theoretic framework called Contentful Brain and Behavior Imaging (CBBI) to tease apart stimulus information that supports behavior (i.e., diagnostic) from that which does not (i.e., nondiagnostic). I then tracked the dynamic representations of both in magneto-encephalographic (MEG) activity. Using CBBI, I demonstrated a rapid (~170 ms) reduction of nondiagnostic information occurs in the occipital cortex and the progression of diagnostic information into right fusiform gyrus where they are constructed to support distinct behaviors. My results highlight how CBBI can be used to investigate the information processing from brain activity by considering interactions between three variables (stimulus information, brain activity, behavior), rather than just two, as is the current norm in neuroimaging studies. I discussed the task-specific diagnostic information as individuals’ dynamic and experienced-based representation about the physical world, which provides us the much-needed information to search and understand the black box of high-dimensional, deep and biological brain networks. I also discussed the practical concerns about using the data-driven approach to uncover diagnostic information
    • …
    corecore