Search CORE

126 research outputs found

Dynamic probabilistic linear discriminant analysis for video classification

Author: Fabris A.
Fabris A.
Kotsia I.
Kotsia I.
Nicolaou M.
Nicolaou M.
Zafeiriou S.
Zafeiriou S.
Publication venue: Institute of Electrical and Electronics Engineers (IEEE)
Publication date: 01/01/2017
Field of study

Component Analysis (CA) comprises of statistical techniques that decompose signals into appropriate latent components, relevant to a task-at-hand (e.g., clustering, segmentation, classification). Recently, an explosion of research in CA has been witnessed, with several novel probabilistic models proposed (e.g., Probabilistic Principal CA, Probabilistic Linear Discriminant Analysis (PLDA), Probabilistic Canonical Correlation Analysis). PLDA is a popular generative probabilistic CA method, that incorporates knowledge regarding class-labels and furthermore introduces class-specific and sample-specific latent spaces. While PLDA has been shown to outperform several state-of-the-art methods, it is nevertheless a static model; any feature-level temporal dependencies that arise in the data are ignored. As has been repeatedly shown, appropriate modelling of temporal dynamics is crucial for the analysis of temporal data (e.g., videos). In this light, we propose the first, to the best of our knowledge, probabilistic LDA formulation that models dynamics, the so-called Dynamic-PLDA (DPLDA). DPLDA is a generative model suitable for video classification and is able to jointly model the label information (e.g., face identity, consistent over videos of the same subject), as well as dynamic variations of each individual video. Experiments on video classification tasks such as face and facial expression recognition show the efficacy of the proposed metho

Middlesex University Research Repository

GANFIT: Generative adversarial network fitting for high fidelity 3D face reconstruction

Author: Gecer B.
Gecer B.
Kotsia I.
Kotsia I.
Ploumpis S.
Ploumpis S.
Zafeiriou S.
Zafeiriou S.
Publication venue: IEEE
Publication date: 01/01/2019
Field of study

In the past few years, a lot of work has been done to- wards reconstructing the 3D facial structure from single images by capitalizing on the power of Deep Convolutional Neural Networks (DCNNs). In the most recent works, differentiable renderers were employed in order to learn the relationship between the facial identity features and the parameters of a 3D morphable model for shape and texture. The texture features either correspond to components of a linear texture space or are learned by auto-encoders directly from in-the-wild images. In all cases, the quality of the facial texture reconstruction of the state-of-the-art methods is still not capable of modeling textures in high fidelity. In this paper, we take a radically different approach and harness the power of Generative Adversarial Networks (GANs) and DCNNs in order to reconstruct the facial texture and shape from single images. That is, we utilize GANs to train a very powerful generator of facial texture in UV space. Then, we revisit the original 3D Morphable Models (3DMMs) fitting approaches making use of non-linear optimization to find the optimal latent parameters that best reconstruct the test image but under a new perspective. We optimize the parameters with the supervision of pretrained deep identity features through our end-to-end differentiable framework. We demonstrate excellent results in photorealistic and identity preserving 3D face reconstructions and achieve for the first time, to the best of our knowledge, facial texture reconstruction with high-frequency details

Middlesex University Research Repository

Dense 3D face decoding over 2500FPS: Joint texture and shape convolutional mesh decoders

Author: Deng J.
Deng J.
Kotsia I.
Kotsia I.
Zafeiriou S.
Zafeiriou S.
Zhou Y.
Zhou Y.
Publication venue: IEEE
Publication date: 01/01/2019
Field of study

3D Morphable Models (3DMMs) are statistical models that represent facial texture and shape variations using a set of linear bases and more particular Principal Component Analysis (PCA). 3DMMs were used as statistical priors for reconstructing 3D faces from images by solving non-linear least square optimization problems. Recently, 3DMMs were used as generative models for training non-linear mappings (i.e., regressors) from image to the parameters of the models via Deep Convolutional Neural Networks (DCNNs). Nev- ertheless, all of the above methods use either fully con- nected layers or 2D convolutions on parametric unwrapped UV spaces leading to large networks with many parame- ters. In this paper, we present the first, to the best of our knowledge, non-linear 3DMMs by learning joint texture and shape auto-encoders using direct mesh convolutions. We demonstrate how these auto-encoders can be used to train very light-weight models that perform Coloured Mesh Decoding (CMD) in-the-wild at a speed of over 2500 FPS

Middlesex University Research Repository

4DFAB: a large scale 4D facial expression database for biometric applications

Author: Cheng S.
Cheng S.
Kotsia I.
Kotsia I.
Pantic M.
Pantic M.
Zafeiriou S.
Zafeiriou S.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2018
Field of study

The progress we are currently witnessing in many computer vision applications, including automatic face analysis, would not be made possible without tremendous efforts in collecting and annotating large scale visual databases. To this end, we propose 4DFAB, a new large scale database of dynamic high-resolution 3D faces (over 1,800,000 3D meshes). 4DFAB contains recordings of 180 subjects captured in four different sessions spanning over a five-year period. It contains 4D videos of subjects displaying both spontaneous and posed facial behaviours. The database can be used for both face and facial expression recognition, as well as behavioural biometrics. It can also be used to learn very powerful blendshapes for parametrising facial behaviour. In this paper, we conduct several experiments and demonstrate the usefulness of the database for various applications. The database will be made publicly available for research purposes

Middlesex University Research Repository

Recognition of affect in the wild using deep neural networks

Author: Kollias D.
Kollias D.
Kotsia I.
Kotsia I.
Nicolaou M.
Nicolaou M.
Zafeiriou S.
Zafeiriou S.
Zhao G.
Zhao G.
Publication venue: Institute of Electrical and Electronics Engineers (IEEE)
Publication date: 01/01/2017
Field of study

In this paper we utilize the first large-scale "in-the-wild" (Aff-Wild) database, which is annotated in terms of the valence-arousal dimensions, to train and test an end-to-end deep neural architecture for the estimation of continuous emotion dimensions based on visual cues. The proposed architecture is based on jointly training convolutional (CNN) and recurrent neural network (RNN) layers, thus exploiting both the invariant properties of convolutional features, while also modelling temporal dynamics that arise in human behaviour via the recurrent layers. Various pre-trained networks are used as starting structures which are subsequently appropriately fine-tuned to the Aff-Wild database. Obtained results show premise for the utilization of deep architectures for the visual analysis of human behaviour in terms of continuous emotion dimensions and analysis of different types of affect

Middlesex University Research Repository

Deep neural network augmentation: generating faces for affect analysis

Author: Cheng S.
Cheng S.
Kollias D.
Kollias D.
Kotsia I.
Kotsia I.
Ververas E.
Ververas E.
Zafeiriou S.
Zafeiriou S.
Publication venue: Springer
Publication date: 01/01/2020
Field of study

This paper presents a novel approach for synthesizing facial affect; either in terms of the six basic expressions (i.e., anger, disgust, fear, joy, sadness and surprise), or in terms of valence (i.e., how positive or negative is an emotion) and arousal (i.e., power of the emotion activation). The proposed approach accepts the following inputs:(i) a neutral 2D image of a person; (ii) a basic facial expression or a pair of valence-arousal (VA) emotional state descriptors to be generated, or a path of affect in the 2D VA space to be generated as an image sequence. In order to synthesize affect in terms of VA, for this person, 600,000 frames from the 4DFAB database were annotated. The affect synthesis is implemented by fitting a 3D Morphable Model on the neutral image, then deforming the reconstructed face and adding the inputted affect, and blending the new face with the given affect into the original image. Qualitative experiments illustrate the generation of realistic images, when the neutral image is sampled from fifteen well known lab-controlled or in-the-wild databases, including Aff-Wild, AffectNet, RAF-DB; comparisons with generative adversarial networks (GANs) show the higher quality achieved by the proposed approach. Then, quantitative experiments are conducted, in which the synthesized images are used for data augmentation in training deep neural networks to perform affect recognition over all databases; greatly improved performances are achieved when compared with state-of-the-art methods, as well as with GAN-based data augmentation, in all cases

Middlesex University Research Repository

Facial affect "in the wild": a survey and a new database

Author: Kotsia I.
Kotsia I.
Nicolaou M.
Nicolaou M.
Papaioannou A.
Papaioannou A.
Zafeiriou S.
Zafeiriou S.
Zhao G.
Zhao G.
Publication venue: Institute of Electrical and Electronics Engineers (IEEE)
Publication date: 01/01/2016
Field of study

Well-established databases and benchmarks have been developed in the past 20 years for automatic facial behaviour analysis. Nevertheless, for some important problems regarding analysis of facial behaviour, such as (a) estimation of affect in a continuous dimensional space (e.g., valence and arousal) in videos displaying spontaneous facial behaviour and (b) detection of the activated facial muscles (i.e., facial action unit detection), to the best of our knowledge, well-established in-the-wild databases and benchmarks do not exist. That is, the majority of the publicly available corpora for the above tasks contain samples that have been captured in controlled recording conditions and/or captured under a very specific milieu. Arguably, in order to make further progress in automatic understanding of facial behaviour, datasets that have been captured in in the-wild and in various milieus have to be developed. In this paper, we survey the progress that has been recently made on understanding facial behaviour in-the-wild, the datasets that have been developed so far and the methodologies that have been developed, paying particular attention to deep learning techniques for the task. Finally, we make a significant step further and propose a new comprehensive benchmark for training methodologies, as well as assessing the performance of facial affect/behaviour analysis/ understanding in-the-wild. To the best of our knowledge, this is the first time that such a benchmark for valence and arousal "in-the-wild" is presente

Middlesex University Research Repository

Aff-Wild: Valence and Arousal ‘in-the-wild’ Challenge

Author: Kollias D.
Kollias D.
Kotsia I.
Kotsia I.
Nicolaou M.
Nicolaou M.
Papaioannou A.
Papaioannou A.
Zafeiriou S.
Zafeiriou S.
Zhao G.
Zhao G.
Publication venue: Institute of Electrical and Electronics Engineers (IEEE)
Publication date: 01/01/2017
Field of study

The Affect-in-the-Wild (Aff-Wild) Challenge proposes a new comprehensive benchmark for assessing the performance of facial affect/behaviour analysis/understanding 'in-the-wild'. The Aff-wild benchmark contains about 300 videos (over 2,000 minutes of data) annotated with regards to valence and arousal, all captured 'in-the-wild' (the main source being Youtube videos). The paper presents the database description, the experimental set up, the baseline method used for the Challenge and finally the summary of the performance of the different methods submitted to the Affect-in-the-Wild Challenge for Valence and Arousal estimation. The challenge demonstrates that meticulously designed deep neural networks can achieve very good performance when trained with in-the-wild data

Middlesex University Research Repository

AgeDB: the first manually collected, in-the-wild age database

Author: Deng J.
Deng J.
Kotsia I.
Kotsia I.
Moschoglou S.
Moschoglou S.
Papaioannou A.
Papaioannou A.
Sagonas C.
Sagonas C.
Zafeiriou S.
Zafeiriou S.
Publication venue: Institute of Electrical and Electronics Engineers (IEEE)
Publication date: 01/01/2017
Field of study

Over the last few years, increased interest has arisen with respect to age-related tasks in the Computer Vision community. As a result, several "in-the-wild" databases annotated with respect to the age attribute became available in the literature. Nevertheless, one major drawback of these databases is that they are semi-automatically collected and annotated and thus they contain noisy labels. Therefore, the algorithms that are evaluated in such databases are prone to noisy estimates. In order to overcome such drawbacks, we present in this paper the first, to the best of knowledge, manually collected "in-the-wild" age database, dubbed AgeDB, containing images annotated with accurate to the year, noise-free labels. As demonstrated by a series of experiments utilizing state-of-the-art algorithms, this unique property renders AgeDB suitable when performing experiments on age-invariant face verification, age estimation and face age progression "in-the-wild"

Middlesex University Research Repository

The Menpo benchmark for multi-pose 2D and 3D facial landmark localisation and tracking

Author: Chrysos G.
Chrysos G.
Deng J.
Deng J.
Kotsia I.
Kotsia I.
Roussos A.
Roussos A.
Shen J.
Shen J.
Ververas E.
Ververas E.
Zafeiriou S.
Zafeiriou S.
Publication venue: Springer
Publication date: 01/01/2019
Field of study

In this article, we present the Menpo 2D and Menpo 3D benchmarks, two new datasets for multi-pose 2D and 3D facial landmark localisation and tracking. In contrast to the previous benchmarks such as 300W and 300VW, the proposed benchmarks contain facial images in both semi-frontal and profile pose. We introduce an elaborate semi-automatic methodology for providing high-quality annotations for both the Menpo 2D and Menpo 3D benchmarks. In Menpo 2D benchmark, different visible landmark configurations are designed for semi-frontal and profile faces, thus making the 2D face alignment full-pose. In Menpo 3D benchmark, a united landmark configuration is designed for both semi-frontal and profile faces based on the correspondence with a 3D face model, thus making face alignment not only full-pose but also corresponding to the real-world 3D space. Based on the considerable number of annotated images, we organised Menpo 2D Challenge and Menpo 3D Challenge for face alignment under large pose variations in conjunction with CVPR 2017 and ICCV 2017, respectively. The results of these challenges demonstrate that recent deep learning architectures, when trained with the abundant data, lead to excellent results. We also provide a very simple, yet effective solution, named Cascade Multi-view Hourglass Model, to 2D and 3D face alignment. In our method, we take advantage of all 2D and 3D facial landmark annotations in a joint way. We not only capitalise on the correspondences between the semi-frontal and profile 2D facial landmarks but also employ joint supervision from both 2D and 3D facial landmarks. Finally, we discuss future directions on the topic of face alignment

Middlesex University Research Repository