14 research outputs found
Spatio-temporal Representation and Analysis of Facial Expressions with Varying Intensities
PhDFacial expressions convey a wealth of information about our feelings, personality and mental
state. In this thesis we seek efficient ways of representing and analysing facial expressions of
varying intensities. Firstly, we analyse state-of-the-art systems by decomposing them into their
fundamental components, in an effort to understand what are the useful practices common to
successful systems. Secondly, we address the problem of sequence registration, which emerged
as an open issue in our analysis. The encoding of the (non-rigid) motions generated by facial expressions
is facilitated when the rigid motions caused by irrelevant factors, such as camera movement,
are eliminated. We propose a sequence registration framework that is based on pre-trained
regressors of Gabor motion energy. Comprehensive experiments show that the proposed method
achieves very high registration accuracy even under difficult illumination variations. Finally,
we propose an unsupervised representation learning framework for encoding the spatio-temporal
evolution of facial expressions. The proposed framework is inspired by the Facial Action Coding
System (FACS), which predates computer-based analysis. FACS encodes an expression in terms
of localised facial movements and assigns an intensity score for each movement. The framework
we propose mimics those two properties of FACS. Specifically, we propose to learn from
data a linear transformation that approximates the facial expression variation in a sequence as
a weighted sum of localised basis functions, where the weight of each basis function relates to
movement intensity. We show that the proposed framework provides a plausible description of
facial expressions, and leads to state-of-the-art performance in recognising expressions across
intensities; from fully blown expressions to micro-expressions
Robust Registration of Dynamic Facial Sequences.
Accurate face registration is a key step for several image analysis applications. However, existing registration methods are prone to temporal drift errors or jitter among consecutive frames. In this paper, we propose an iterative rigid registration framework that estimates the misalignment with trained regressors. The input of the regressors is a robust motion representation that encodes the motion between a misaligned frame and the reference frame(s), and enables reliable performance under non-uniform illumination variations. Drift errors are reduced when the motion representation is computed from multiple reference frames. Furthermore, we use the L2 norm of the representation as a cue for performing coarse-to-fine registration efficiently. Importantly, the framework can identify registration failures and correct them. Experiments show that the proposed approach achieves significantly higher registration accuracy than the state-of-the-art techniques in challenging sequences.The research work of Evangelos Sariyanidi and Hatice Gunes has been partially supported by the EPSRC under its IDEAS Factory Sandpits call on Digital Personhood (Grant Ref.: EP/L00416X/1)
Recommended from our members
Live human-robot interactive public demonstrations with automatic emotion and personality prediction.
Communication with humans is a multi-faceted phenomenon where the emotions, personality and non-verbal behaviours, as well as the verbal behaviours, play a significant role, and human-robot interaction (HRI) technologies should respect this complexity to achieve efficient and seamless communication. In this paper, we describe the design and execution of five public demonstrations made with two HRI systems that aimed at automatically sensing and analysing human participants' non-verbal behaviour and predicting their facial action units, facial expressions and personality in real time while they interacted with a small humanoid robot. We describe an overview of the challenges faced together with the lessons learned from those demonstrations in order to better inform the science and engineering fields to design and build better robots with more purposeful interaction capabilities. This article is part of the theme issue 'From social brains to social robots: applying neurocognitive insights to human-robot interaction'.EPSR
Biologically-Inspired Motion Encoding for Robust Global Motion Estimation.
The growing use of cameras embedded in autonomous robotic platforms and worn by people is increasing the importance of accurate global motion estimation (GME). However, existing GME methods may degrade considerably under illumination variations. In this paper, we address this problem by proposing a biologically-inspired GME method that achieves high estimation accuracy in the presence of illumination variations. We mimic the early layers of the human visual cortex with the spatio-temporal Gabor motion energy by adopting the pioneering model of Adelson and Bergen and we provide the closed-form expressions that enable the study and adaptation of this model to different application needs. Moreover, we propose a normalisation scheme for motion energy to tackle temporal illumination variations. Finally, we provide an overall GME scheme which, to the best of our knowledge, achieves the highest accuracy on the Pose, Illumination, and Expression (PIE) database
SARIYANIDI et al.: LOCAL ZERNIKE MOMENTS FOR FACIAL AFFECT RECOGNITION 1 Local Zernike Moment Representation for Facial Affect Recognition
In this paper, we propose to use local Zernike Moments (ZMs) for facial affect recognition and introduce a representation scheme based on performing non-linear encoding on ZMs via quantization. Local ZMs provide a useful and compact description of image discontinuities and texture. We demonstrate the use of this ZM-based representation for posed and discrete as well as naturalistic and continuous affect recognition on standard datasets, and show that ZM-based representations outperform well-established alternative approaches for both tasks. To the best of our knowledge, the performance we achieved on CK+ dataset is superior to all results reported to date.
Learning Bases of Activity for Facial Expression Recognition.
The extraction of descriptive features from sequences of faces is a fundamental problem in facial expression analysis. Facial expressions are represented by psychologists as a combination of elementary movements known as action units: each movement is localised and its intensity is specified with a score that is small when the movement is subtle and large when the movement is pronounced. Inspired by this approach, we propose a novel data-driven feature extraction framework that represents facial expression variations as a linear combination of localised basis functions, whose coefficients are proportional to movement intensity. We show that the linear basis functions required by this framework can be obtained by training a sparse linear model with Gabor phase shifts computed from facial videos. The proposed framework addresses generalisation issues that are not addressed by existing learnt representations, and achieves, with the same learning parameters, state-of-the-art results in recognising both posed expressions and spontaneous micro-expressions. This performance is confirmed even when the data used to train the model differ from test data in terms of the intensity of facial movements and frame rate.The work of E. Sariyanidi and H. Gunes are partially supported by the EPSRC under its IDEAS Factory Sandpits call on Digital Personhood under Grant EP/L00416X/1
Automatic analysis of facilitated taste-liking
This paper focuses on: (i) Automatic recognition of taste-liking
from facial videos by comparatively training and evaluating models
with engineered features and state-of-the-art deep learning
architectures, and (ii) analysing the classification results along the
aspects of facilitator type, and the gender, ethnicity, and personality
of the participants. To this aim, a new beverage tasting dataset
acquired under different conditions (human vs. robot facilitator
and priming vs. non-priming facilitation) is utilised. The experimental
results show that: (i) The deep spatiotemporal architectures
provide better classification results than the engineered feature
models; (ii) the classification results for all three classes of liking,
neutral and disliking reach F1 scores in the range of 71%-91%; (iii)
the personality-aware network that fuses participants’ personality
information with that of facial reaction features provides improved
classification performance; and (iv) classification results vary across
participant gender, but not across facilitator type and participant
ethnicity.EPSR
Visual Loop Closure Detection For Autonomous Mobile Robot Navigation Via Unsupervised Landmark Extraction
Tez (Yüksek Lisans) -- İstanbul Teknik Üniversitesi, Fen Bilimleri Enstitüsü, 2012Thesis (M.Sc.) -- İstanbul Technical University, Institute of Science and Technology, 2012Otonom navigasyon, mobil robotik alanında üzerinde en çok çalışılan konulardan biri olagelmiştir. Eş zamanlı Konum Belirleme ve Haritalama da (EZKH), otonom navigasyon konusu içinde en çok araştırılmış ve hala araştırılmakta olan problemlerden biridir. EZKH bağlamında çevrim kapama problemi, otonom bir robotun daha önce bulunmuş olduğu bir yeri başarıyla tanıyabilmesi olarak özetlenebilir. Çevrim kapama çalışmalarının EZKH kapsamında ayrı bir önemi vardır, çünkü başarıyla gerçekleştirilen çevrim kapamalar robotun en güncel konumunu çok daha yüksek bir hassasiyetle belirleyip, geçmiş yörüngesindeki konumları üzerindeki kestirimlerini iyileştirmesine olanak sağlar. Bu tez kapsamında, bilgisayarla görüntü tekniklerine dayanan yeni bir çevrim kapama yöntemi önerilmiştir. Bu yöntemin dayandığı nokta, görüntülerin çeşitli görsel imleçler yoluyla seyrek biçimde temsil edilmesidir. Seyrek biçimde temsil edilen görüntülerden bir görünüm uzayı oluşturulmakta, çevrim kapama hipotezleri de nihai olarak bu uzayda önerilmektedir. Bu tez kapsamında bilimsel yazına sunulan iki temel katkı bulunmaktadır. Bu katkılardan ilki, görüntüleri temsil etmekte kullanılacak görsel imleçleri güdümsüz biçimde çıkarmak için kullanılan bir algoritmadır. Diğeri ise, gelen görüntüler ve geçmişte gezilen yerlerin görüntüleri arasındaki benzerliği oluşturulan görünüm uzayı üzerinde ölçmeye dayanan bütünsel bir çevrim kapama tekniğidir. Deneysel sonuçlar, önerilen çevrim kapama yönteminin bilinen diğer yöntemlerden daha iyi çalıştıklarını göstermektedir.Autonomous navigation is a very active research field in mobile robotics. Simultaneous localization and mapping (SLAM) is one of the major problems linked with autonomous navigation. One of the essential issues in SLAM is the detection of loop closures. Within the context of SLAM, loop closing can be defined as the correct identification of a previously visited location. Loop closure detection is a significant ability for a mobile robot, since successful loop closure detection leads to substantial improvement in the overall SLAM performance. This thesis introduces a novel loop closure detection technique, which relies on visual sensory. Images are sparsely represented via visual landmarks, which are extracted in an unsupervised manner. The sparsely represented images form an appearance space, and the loop closure hypotheses are ultimately cast on this appearance space. The major contributions of this thesis are twofold. The first contribution is a novel saliency detection algorithm, which is used for unsupervised visual landmark extraction. The second contribution, is an overall loop closure detection technique, which relies on the similarity measurement between an incoming image and the images of the locations on the appearance space. Experimental results, indicate that the results of the proposed technique are quite promising, and comparable to the state of the art to say the least.Yüksek LisansM.Sc