721 research outputs found

    Toward a social psychophysics of face communication

    Get PDF
    As a highly social species, humans are equipped with a powerful tool for social communication—the face, which can elicit multiple social perceptions in others due to the rich and complex variations of its movements, morphology, and complexion. Consequently, identifying precisely what face information elicits different social perceptions is a complex empirical challenge that has largely remained beyond the reach of traditional research methods. More recently, the emerging field of social psychophysics has developed new methods designed to address this challenge. Here, we introduce and review the foundational methodological developments of social psychophysics, present recent work that has advanced our understanding of the face as a tool for social communication, and discuss the main challenges that lie ahead

    Computational Modeling of Facial Response for Detecting Differential Traits in Autism Spectrum Disorders

    Get PDF
    This dissertation proposes novel computational modeling and computer vision methods for the analysis and discovery of differential traits in subjects with Autism Spectrum Disorders (ASD) using video and three-dimensional (3D) images of face and facial expressions. ASD is a neurodevelopmental disorder that impairs an individual’s nonverbal communication skills. This work studies ASD from the pathophysiology of facial expressions which may manifest atypical responses in the face. State-of-the-art psychophysical studies mostly employ na¨ıve human raters to visually score atypical facial responses of individuals with ASD, which may be subjective, tedious, and error prone. A few quantitative studies use intrusive sensors on the face of the subjects with ASD, which in turn, may inhibit or bias the natural facial responses of these subjects. This dissertation proposes non-intrusive computer vision methods to alleviate these limitations in the investigation for differential traits from the spontaneous facial responses of individuals with ASD. Two IRB-approved psychophysical studies are performed involving two groups of age-matched subjects: one for subjects diagnosed with ASD and the other for subjects who are typically-developing (TD). The facial responses of the subjects are computed from their facial images using the proposed computational models and then statistically analyzed to infer about the differential traits for the group with ASD. A novel computational model is proposed to represent the large volume of 3D facial data in a small pose-invariant Frenet frame-based feature space. The inherent pose-invariant property of the proposed features alleviates the need for an expensive 3D face registration in the pre-processing step. The proposed modeling framework is not only computationally efficient but also offers competitive performance in 3D face and facial expression recognition tasks when compared with that of the state-ofthe-art methods. This computational model is applied in the first experiment to quantify subtle facial muscle response from the geometry of 3D facial data. Results show a statistically significant asymmetry in specific pair of facial muscle activation (p\u3c0.05) for the group with ASD, which suggests the presence of a psychophysical trait (also known as an ’oddity’) in the facial expressions. For the first time in the ASD literature, the facial action coding system (FACS) is employed to classify the spontaneous facial responses based on facial action units (FAUs). Statistical analyses reveal significantly (p\u3c0.01) higher prevalence of smile expression (FAU 12) for the ASD group when compared with the TD group. The high prevalence of smile has co-occurred with significantly averted gaze (p\u3c0.05) in the group with ASD, which is indicative of an impaired reciprocal communication. The metric associated with incongruent facial and visual responses suggests a behavioral biomarker for ASD. The second experiment shows a higher prevalence of mouth frown (FAU 15) and significantly lower correlations between the activation of several FAU pairs (p\u3c0.05) in the group with ASD when compared with the TD group. The proposed computational modeling in this dissertation offers promising biomarkers, which may aid in early detection of subtle ASD-related traits, and thus enable an effective intervention strategy in the future

    Facial Paralysis Grading Based on Dynamic and Static Features

    Get PDF
    Peripheral facial nerve palsy, also known as facial paralysis (FP), is a common clinical disease, which requires subjective judgment and scoring based on the FP scale. There exists some automatic facial paralysis grading methods, but the current methods mostly only consider either static or dynamic features, resulting in a low accuracy rate of FP grading. This thesis proposes an automatic facial paralysis assessment method including both static and dynamic characteristics. The first step of the method performs preprocessing on the collected facial expression videos of the subjects, including rough video interception, video stabilization, keyframe extraction, image geometric normalization and gray-scale normalization. Next, the method selects as keyframes no facial expression state and maximum facial expression state in the image data to build the the research data set. Data preprocessing reduces errors, noise, redundancy and even errors in the original data. The basis for extracting static and dynamic features of an image is to use Ensemble of Regression Trees algorithm to determine 68 facial landmarks. Based on landmark points, image regions of image are formed. According to the Horn-Schunck optical flow method, the optical flow information of parts of the face are extracted, and the dynamic characteristics of the optical flow difference between the left and right parts are calculated. Finally, the results of dynamic and static feature classification are weighted and analyzed to obtain FP ratings of subjects. A 32-dimensional static feature is fed into the support vector machine for classification. A 60-dimensional feature vector of dynamical aspects is fed into a long and short-term memory network for classification. Videos of 30 subjects are used to extract 1419 keyframes to test the algorithm. The accuracy, precision, recall and f1 of the best classifier reach 93.33%, 94.29%, 91.33% and 91.87%, respectively.Perifeerinen kasvojen hermohalvaus, joka tunnetaan myös nimellä kasvojen halvaus (FP), on yleinen kliininen sairaus, joka vaatii subjektiivista arviointia ja FP -asteikon pisteytystä. Joitakin automaattisia kasvohalvauksen luokittelumenetelmiä on olemassa, mutta yleensä ottaen ne punnitsevat vain joko staattisia tai dynaamisia piirteitä. Tässä tutkielmassa ehdotetaan automaattista kasvojen halvaantumisen arviointimenetelmää, joka kattaa sekä staattiset että dynaamiset ominaisuudet. Menetelmän ensimmäinen vaihe suorittaa ensin esikäsittelyn kohteiden kerätyille kasvojen ilmevideoille, mukaan lukien karkea videon sieppaus, videon vakautus, avainruudun poiminta, kuvan geometrinen normalisointi ja harmaasävyjen normalisointi. Seuraavaksi menetelmä valitsee avainruuduiksi ilmeettömän tilan ja kasvojen ilmeiden maksimitilan kuvadatasta kerryttäen tutkimuksen data-aineiston. Tietojen esikäsittely vähentää virheitä, kohinaa, redundanssia ja jopa virheitä alkuperäisestä datasta. Kuvan staattisten ja dynaamisten piirteiden poimimisen perusta on käyttää Ensemble of Regression Trees -algoritmia 68 kasvojen merkkipisteiden määrittämiseen. Merkkipisteiden perusteella määritellään kuvan kiinnostavat alueet. Horn-Schunckin optisen virtausmenetelmän mukaisesti poimitaan optisen virtauksen tiedot joistakin kasvojen osista, ja dynaaminen luonnehdinta lasketaan vasempien ja oikeiden osien välille. Lopuksi dynaamisen ja staattisen piirteiden luokittelun tulokset painotetaan ja analysoidaan kattavasti koehenkilöiden FP-luokitusten saamiseksi. 32- ulotteinen staattisten piirteiden vektori syötetään tukivektorikoneeseen luokittelua varten. 60-ulotteinen dynaamisten piirteiden ominaisuusvektori syötetään pitkän ja lyhyen aikavälin muistiverkkoon luokittelua varten. Parhaan luokittelijan tarkkuus, täsmällisyys, palautustaso ja f1 saavuttavat arvot 93,33%, 94,29%, 91,33% ja 91,87%

    A Brain-Inspired Multi-Modal Perceptual System for Social Robots: An Experimental Realization

    Get PDF
    We propose a multi-modal perceptual system that is inspired by the inner working of the human brain; in particular, the hierarchical structure of the sensory cortex and the spatial-temporal binding criteria. The system is context independent and can be applied to many on-going problems in social robotics, including but not limited to person recognition, emotion recognition, and multi-modal robot doctor to name a few. The system encapsulates the parallel distributed processing of real-world stimuli through different sensor modalities and encoding them into features vectors which in turn are processed via a number of dedicated processing units (DPUs) through hierarchical paths. DPUs are algorithmic realizations of the cell assemblies in neuroscience. A plausible and realistic perceptual system is presented via the integration of the outputs from these units by spiking neural networks. We will also discuss other components of the system including top-down influences and the integration of information through temporal binding with fading memory and suggest two alternatives to realize these criteria. Finally, we will demonstrate the implementation of this architecture on a hardware platform as a social robot and report experimental studies on the system

    Hierarchical age estimation using enhanced facial features.

    Get PDF
    Doctor of Philosopy in Computer Science, University of KwaZulu-Natal, Westville, 2018.Ageing is a stochastic, inevitable and uncontrollable process that constantly affect shape, texture and general appearance of the human face. Humans can easily determine ones’ gender, identity and ethnicity with highest accuracy as compared to age. This makes development of automatic age estimation techniques that surpass human performance an attractive yet challenging task. Automatic age estimation requires extraction of robust and reliable age discriminative features. Local binary patterns (LBP) sensitivity to noise makes it insufficiently reliable in capturing age discriminative features. Although local ternary patterns (LTP) is insensitive to noise, it uses a single static threshold for all images regardless of varied image conditions. Local directional patterns (LDP) uses k directional responses to encode image gradient and disregards not only central pixel in the local neighborhood but also 8 k directional responses. Every pixel in an image carry subtle information. Discarding 8 k directional responses lead to lose of discriminative texture features. This study proposes two variations of LDP operator for texture extraction. Significantorientation response LDP (SOR-LDP) encodes image gradient by grouping eight directional responses into four pairs. Each pair represents orientation of an edge with respect to central reference pixel. Values in each pair are compared and the bit corresponding to the maximum value in the pair is set to 1 while the other is set to 0. The resultant binary code is converted to decimal and assigned to the central pixel as its’ SOR-LDP code. Texture features are contained in the histogram of SOR-LDP encoded image. Local ternary directional patterns (LTDP) first gets the difference between neighboring pixels and central pixel in 3 3 image region. These differential values are convolved with Kirsch edge detectors to obtain directional responses. These responses are normalized and used as probability of an edge occurring towards a respective direction. An adaptive threshold is applied to derive LTDP code. The LTDP code is split into its positive and negative LTDP codes. Histograms of negative and positive LTDP encoded images are concatenated to obtain texture feature. Regardless of there being evidence of spatial frequency processing in primary visual cortex, biologically inspired features (BIF) that model visual cortex uses only scale and orientation selectivity in feature extraction. Furthermore, these BIF are extracted using holistic (global) pooling across scale and orientations leading to lose of substantive information. This study proposes multi-frequency BIF (MF-BIF) where frequency selectivity is introduced in BIF modelling. Local statistical BIF (LS-BIF) uses local pooling within scale, orientation and frequency in n n region for BIF extraction. Using Leave-one-person-out (LOPO) validation protocol, this study investigated performance of proposed feature extractors in age estimation in a hierarchical way by performing age-group classification using Multi-layer Perceptron (MLP) followed by within age-group exact age regression using support vector regression (SVR). Mean absolute error (MAE) and cumulative score (CS) were used to evaluate performance of proposed face descriptors. Experimental results on FG-NET ageing dataset show that SOR-LDP, LTDP, MF-BIF and LS-BIF outperform state-of-the-art feature descriptors in age estimation. Experimental results show that performing gender discrimination before age-group and age estimation further improves age estimation accuracies. Shape, appearance, wrinkle and texture features are simultaneously extracted by visual system in primates for the brain to process and understand an image or a scene. However, age estimation systems in the literature use a single feature for age estimation. A single feature is not sufficient enough to capture subtle age discriminative traits due to stochastic and personalized nature of ageing. This study propose fusion of different facial features to enhance their discriminative power. Experimental results show that fusing shape, texture, wrinkle and appearance result into robust age discriminative features that achieve lower MAE compared to single feature performance

    Change blindness: eradication of gestalt strategies

    Get PDF
    Arrays of eight, texture-defined rectangles were used as stimuli in a one-shot change blindness (CB) task where there was a 50% chance that one rectangle would change orientation between two successive presentations separated by an interval. CB was eliminated by cueing the target rectangle in the first stimulus, reduced by cueing in the interval and unaffected by cueing in the second presentation. This supports the idea that a representation was formed that persisted through the interval before being 'overwritten' by the second presentation (Landman et al, 2003 Vision Research 43149–164]. Another possibility is that participants used some kind of grouping or Gestalt strategy. To test this we changed the spatial position of the rectangles in the second presentation by shifting them along imaginary spokes (by ±1 degree) emanating from the central fixation point. There was no significant difference seen in performance between this and the standard task [F(1,4)=2.565, p=0.185]. This may suggest two things: (i) Gestalt grouping is not used as a strategy in these tasks, and (ii) it gives further weight to the argument that objects may be stored and retrieved from a pre-attentional store during this task

    Joint optimization of manifold learning and sparse representations for face and gesture analysis

    Get PDF
    Face and gesture understanding algorithms are powerful enablers in intelligent vision systems for surveillance, security, entertainment, and smart spaces. In the future, complex networks of sensors and cameras may disperse directions to lost tourists, perform directory lookups in the office lobby, or contact the proper authorities in case of an emergency. To be effective, these systems will need to embrace human subtleties while interacting with people in their natural conditions. Computer vision and machine learning techniques have recently become adept at solving face and gesture tasks using posed datasets in controlled conditions. However, spontaneous human behavior under unconstrained conditions, or in the wild, is more complex and is subject to considerable variability from one person to the next. Uncontrolled conditions such as lighting, resolution, noise, occlusions, pose, and temporal variations complicate the matter further. This thesis advances the field of face and gesture analysis by introducing a new machine learning framework based upon dimensionality reduction and sparse representations that is shown to be robust in posed as well as natural conditions. Dimensionality reduction methods take complex objects, such as facial images, and attempt to learn lower dimensional representations embedded in the higher dimensional data. These alternate feature spaces are computationally more efficient and often more discriminative. The performance of various dimensionality reduction methods on geometric and appearance based facial attributes are studied leading to robust facial pose and expression recognition models. The parsimonious nature of sparse representations (SR) has successfully been exploited for the development of highly accurate classifiers for various applications. Despite the successes of SR techniques, large dictionaries and high dimensional data can make these classifiers computationally demanding. Further, sparse classifiers are subject to the adverse effects of a phenomenon known as coefficient contamination, where for example variations in pose may affect identity and expression recognition. This thesis analyzes the interaction between dimensionality reduction and sparse representations to present a unified sparse representation classification framework that addresses both issues of computational complexity and coefficient contamination. Semi-supervised dimensionality reduction is shown to mitigate the coefficient contamination problems associated with SR classifiers. The combination of semi-supervised dimensionality reduction with SR systems forms the cornerstone for a new face and gesture framework called Manifold based Sparse Representations (MSR). MSR is shown to deliver state-of-the-art facial understanding capabilities. To demonstrate the applicability of MSR to new domains, MSR is expanded to include temporal dynamics. The joint optimization of dimensionality reduction and SRs for classification purposes is a relatively new field. The combination of both concepts into a single objective function produce a relation that is neither convex, nor directly solvable. This thesis studies this problem to introduce a new jointly optimized framework. This framework, termed LGE-KSVD, utilizes variants of Linear extension of Graph Embedding (LGE) along with modified K-SVD dictionary learning to jointly learn the dimensionality reduction matrix, sparse representation dictionary, sparse coefficients, and sparsity-based classifier. By injecting LGE concepts directly into the K-SVD learning procedure, this research removes the support constraints K-SVD imparts on dictionary element discovery. Results are shown for facial recognition, facial expression recognition, human activity analysis, and with the addition of a concept called active difference signatures, delivers robust gesture recognition from Kinect or similar depth cameras

    A multimodal deep learning framework using local feature representations for face recognition

    Get PDF
    YesThe most recent face recognition systems are mainly dependent on feature representations obtained using either local handcrafted-descriptors, such as local binary patterns (LBP), or use a deep learning approach, such as deep belief network (DBN). However, the former usually suffers from the wide variations in face images, while the latter usually discards the local facial features, which are proven to be important for face recognition. In this paper, a novel framework based on merging the advantages of the local handcrafted feature descriptors with the DBN is proposed to address the face recognition problem in unconstrained conditions. Firstly, a novel multimodal local feature extraction approach based on merging the advantages of the Curvelet transform with Fractal dimension is proposed and termed the Curvelet–Fractal approach. The main motivation of this approach is that theCurvelet transform, a newanisotropic and multidirectional transform, can efficiently represent themain structure of the face (e.g., edges and curves), while the Fractal dimension is one of the most powerful texture descriptors for face images. Secondly, a novel framework is proposed, termed the multimodal deep face recognition (MDFR)framework, to add feature representations by training aDBNon top of the local feature representations instead of the pixel intensity representations. We demonstrate that representations acquired by the proposed MDFR framework are complementary to those acquired by the Curvelet–Fractal approach. Finally, the performance of the proposed approaches has been evaluated by conducting a number of extensive experiments on four large-scale face datasets: the SDUMLA-HMT, FERET, CAS-PEAL-R1, and LFW databases. The results obtained from the proposed approaches outperform other state-of-the-art of approaches (e.g., LBP, DBN, WPCA) by achieving new state-of-the-art results on all the employed datasets
    • …
    corecore