14 research outputs found

    Spatio-temporal Representation and Analysis of Facial Expressions with Varying Intensities

    Get PDF
    PhDFacial expressions convey a wealth of information about our feelings, personality and mental state. In this thesis we seek efficient ways of representing and analysing facial expressions of varying intensities. Firstly, we analyse state-of-the-art systems by decomposing them into their fundamental components, in an effort to understand what are the useful practices common to successful systems. Secondly, we address the problem of sequence registration, which emerged as an open issue in our analysis. The encoding of the (non-rigid) motions generated by facial expressions is facilitated when the rigid motions caused by irrelevant factors, such as camera movement, are eliminated. We propose a sequence registration framework that is based on pre-trained regressors of Gabor motion energy. Comprehensive experiments show that the proposed method achieves very high registration accuracy even under difficult illumination variations. Finally, we propose an unsupervised representation learning framework for encoding the spatio-temporal evolution of facial expressions. The proposed framework is inspired by the Facial Action Coding System (FACS), which predates computer-based analysis. FACS encodes an expression in terms of localised facial movements and assigns an intensity score for each movement. The framework we propose mimics those two properties of FACS. Specifically, we propose to learn from data a linear transformation that approximates the facial expression variation in a sequence as a weighted sum of localised basis functions, where the weight of each basis function relates to movement intensity. We show that the proposed framework provides a plausible description of facial expressions, and leads to state-of-the-art performance in recognising expressions across intensities; from fully blown expressions to micro-expressions

    Robust Registration of Dynamic Facial Sequences.

    Get PDF
    Accurate face registration is a key step for several image analysis applications. However, existing registration methods are prone to temporal drift errors or jitter among consecutive frames. In this paper, we propose an iterative rigid registration framework that estimates the misalignment with trained regressors. The input of the regressors is a robust motion representation that encodes the motion between a misaligned frame and the reference frame(s), and enables reliable performance under non-uniform illumination variations. Drift errors are reduced when the motion representation is computed from multiple reference frames. Furthermore, we use the L2 norm of the representation as a cue for performing coarse-to-fine registration efficiently. Importantly, the framework can identify registration failures and correct them. Experiments show that the proposed approach achieves significantly higher registration accuracy than the state-of-the-art techniques in challenging sequences.The research work of Evangelos Sariyanidi and Hatice Gunes has been partially supported by the EPSRC under its IDEAS Factory Sandpits call on Digital Personhood (Grant Ref.: EP/L00416X/1)

    Biologically-Inspired Motion Encoding for Robust Global Motion Estimation.

    Get PDF
    The growing use of cameras embedded in autonomous robotic platforms and worn by people is increasing the importance of accurate global motion estimation (GME). However, existing GME methods may degrade considerably under illumination variations. In this paper, we address this problem by proposing a biologically-inspired GME method that achieves high estimation accuracy in the presence of illumination variations. We mimic the early layers of the human visual cortex with the spatio-temporal Gabor motion energy by adopting the pioneering model of Adelson and Bergen and we provide the closed-form expressions that enable the study and adaptation of this model to different application needs. Moreover, we propose a normalisation scheme for motion energy to tackle temporal illumination variations. Finally, we provide an overall GME scheme which, to the best of our knowledge, achieves the highest accuracy on the Pose, Illumination, and Expression (PIE) database

    SARIYANIDI et al.: LOCAL ZERNIKE MOMENTS FOR FACIAL AFFECT RECOGNITION 1 Local Zernike Moment Representation for Facial Affect Recognition

    Get PDF
    In this paper, we propose to use local Zernike Moments (ZMs) for facial affect recognition and introduce a representation scheme based on performing non-linear encoding on ZMs via quantization. Local ZMs provide a useful and compact description of image discontinuities and texture. We demonstrate the use of this ZM-based representation for posed and discrete as well as naturalistic and continuous affect recognition on standard datasets, and show that ZM-based representations outperform well-established alternative approaches for both tasks. To the best of our knowledge, the performance we achieved on CK+ dataset is superior to all results reported to date.

    Learning Bases of Activity for Facial Expression Recognition.

    Get PDF
    The extraction of descriptive features from sequences of faces is a fundamental problem in facial expression analysis. Facial expressions are represented by psychologists as a combination of elementary movements known as action units: each movement is localised and its intensity is specified with a score that is small when the movement is subtle and large when the movement is pronounced. Inspired by this approach, we propose a novel data-driven feature extraction framework that represents facial expression variations as a linear combination of localised basis functions, whose coefficients are proportional to movement intensity. We show that the linear basis functions required by this framework can be obtained by training a sparse linear model with Gabor phase shifts computed from facial videos. The proposed framework addresses generalisation issues that are not addressed by existing learnt representations, and achieves, with the same learning parameters, state-of-the-art results in recognising both posed expressions and spontaneous micro-expressions. This performance is confirmed even when the data used to train the model differ from test data in terms of the intensity of facial movements and frame rate.The work of E. Sariyanidi and H. Gunes are partially supported by the EPSRC under its IDEAS Factory Sandpits call on Digital Personhood under Grant EP/L00416X/1

    Automatic analysis of facilitated taste-liking

    Get PDF
    This paper focuses on: (i) Automatic recognition of taste-liking from facial videos by comparatively training and evaluating models with engineered features and state-of-the-art deep learning architectures, and (ii) analysing the classification results along the aspects of facilitator type, and the gender, ethnicity, and personality of the participants. To this aim, a new beverage tasting dataset acquired under different conditions (human vs. robot facilitator and priming vs. non-priming facilitation) is utilised. The experimental results show that: (i) The deep spatiotemporal architectures provide better classification results than the engineered feature models; (ii) the classification results for all three classes of liking, neutral and disliking reach F1 scores in the range of 71%-91%; (iii) the personality-aware network that fuses participants’ personality information with that of facial reaction features provides improved classification performance; and (iv) classification results vary across participant gender, but not across facilitator type and participant ethnicity.EPSR

    Visual Loop Closure Detection For Autonomous Mobile Robot Navigation Via Unsupervised Landmark Extraction

    Get PDF
    Tez (Yüksek Lisans) -- İstanbul Teknik Üniversitesi, Fen Bilimleri Enstitüsü, 2012Thesis (M.Sc.) -- İstanbul Technical University, Institute of Science and Technology, 2012Otonom navigasyon, mobil robotik alanında üzerinde en çok çalışılan konulardan biri olagelmiştir. Eş zamanlı Konum Belirleme ve Haritalama da (EZKH), otonom navigasyon konusu içinde en çok araştırılmış ve hala araştırılmakta olan problemlerden biridir. EZKH bağlamında çevrim kapama problemi, otonom bir robotun daha önce bulunmuş olduğu bir yeri başarıyla tanıyabilmesi olarak özetlenebilir. Çevrim kapama çalışmalarının EZKH kapsamında ayrı bir önemi vardır, çünkü başarıyla gerçekleştirilen çevrim kapamalar robotun en güncel konumunu çok daha yüksek bir hassasiyetle belirleyip, geçmiş yörüngesindeki konumları üzerindeki kestirimlerini iyileştirmesine olanak sağlar. Bu tez kapsamında, bilgisayarla görüntü tekniklerine dayanan yeni bir çevrim kapama yöntemi önerilmiştir. Bu yöntemin dayandığı nokta, görüntülerin çeşitli görsel imleçler yoluyla seyrek biçimde temsil edilmesidir. Seyrek biçimde temsil edilen görüntülerden bir görünüm uzayı oluşturulmakta, çevrim kapama hipotezleri de nihai olarak bu uzayda önerilmektedir. Bu tez kapsamında bilimsel yazına sunulan iki temel katkı bulunmaktadır. Bu katkılardan ilki, görüntüleri temsil etmekte kullanılacak görsel imleçleri güdümsüz biçimde çıkarmak için kullanılan bir algoritmadır. Diğeri ise, gelen görüntüler ve geçmişte gezilen yerlerin görüntüleri arasındaki benzerliği oluşturulan görünüm uzayı üzerinde ölçmeye dayanan bütünsel bir çevrim kapama tekniğidir. Deneysel sonuçlar, önerilen çevrim kapama yönteminin bilinen diğer yöntemlerden daha iyi çalıştıklarını göstermektedir.Autonomous navigation is a very active research field in mobile robotics. Simultaneous localization and mapping (SLAM) is one of the major problems linked with autonomous navigation. One of the essential issues in SLAM is the detection of loop closures. Within the context of SLAM, loop closing can be defined as the correct identification of a previously visited location. Loop closure detection is a significant ability for a mobile robot, since successful loop closure detection leads to substantial improvement in the overall SLAM performance. This thesis introduces a novel loop closure detection technique, which relies on visual sensory. Images are sparsely represented via visual landmarks, which are extracted in an unsupervised manner. The sparsely represented images form an appearance space, and the loop closure hypotheses are ultimately cast on this appearance space. The major contributions of this thesis are twofold. The first contribution is a novel saliency detection algorithm, which is used for unsupervised visual landmark extraction. The second contribution, is an overall loop closure detection technique, which relies on the similarity measurement between an incoming image and the images of the locations on the appearance space. Experimental results, indicate that the results of the proposed technique are quite promising, and comparable to the state of the art to say the least.Yüksek LisansM.Sc
    corecore