Search CORE

203 research outputs found

Recognition of affect in the wild using deep neural networks

Author: Kollias D.
Kollias D.
Kotsia I.
Kotsia I.
Nicolaou M.
Nicolaou M.
Zafeiriou S.
Zafeiriou S.
Zhao G.
Zhao G.
Publication venue: Institute of Electrical and Electronics Engineers (IEEE)
Publication date: 01/01/2017
Field of study

In this paper we utilize the first large-scale "in-the-wild" (Aff-Wild) database, which is annotated in terms of the valence-arousal dimensions, to train and test an end-to-end deep neural architecture for the estimation of continuous emotion dimensions based on visual cues. The proposed architecture is based on jointly training convolutional (CNN) and recurrent neural network (RNN) layers, thus exploiting both the invariant properties of convolutional features, while also modelling temporal dynamics that arise in human behaviour via the recurrent layers. Various pre-trained networks are used as starting structures which are subsequently appropriately fine-tuned to the Aff-Wild database. Obtained results show premise for the utilization of deep architectures for the visual analysis of human behaviour in terms of continuous emotion dimensions and analysis of different types of affect

Middlesex University Research Repository

Deep neural network augmentation: generating faces for affect analysis

Author: Cheng S.
Cheng S.
Kollias D.
Kollias D.
Kotsia I.
Kotsia I.
Ververas E.
Ververas E.
Zafeiriou S.
Zafeiriou S.
Publication venue: Springer
Publication date: 01/01/2020
Field of study

This paper presents a novel approach for synthesizing facial affect; either in terms of the six basic expressions (i.e., anger, disgust, fear, joy, sadness and surprise), or in terms of valence (i.e., how positive or negative is an emotion) and arousal (i.e., power of the emotion activation). The proposed approach accepts the following inputs:(i) a neutral 2D image of a person; (ii) a basic facial expression or a pair of valence-arousal (VA) emotional state descriptors to be generated, or a path of affect in the 2D VA space to be generated as an image sequence. In order to synthesize affect in terms of VA, for this person, 600,000 frames from the 4DFAB database were annotated. The affect synthesis is implemented by fitting a 3D Morphable Model on the neutral image, then deforming the reconstructed face and adding the inputted affect, and blending the new face with the given affect into the original image. Qualitative experiments illustrate the generation of realistic images, when the neutral image is sampled from fifteen well known lab-controlled or in-the-wild databases, including Aff-Wild, AffectNet, RAF-DB; comparisons with generative adversarial networks (GANs) show the higher quality achieved by the proposed approach. Then, quantitative experiments are conducted, in which the synthesized images are used for data augmentation in training deep neural networks to perform affect recognition over all databases; greatly improved performances are achieved when compared with state-of-the-art methods, as well as with GAN-based data augmentation, in all cases

Middlesex University Research Repository

A 3D morphable model learnt from 10,000 faces

Author: Booth J
Dunaway D
Ponniah A
Roussos A
Zafeiriou S
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 08/03/2018
Field of study

This is the final version of the article. It is the open access version, provided by the Computer Vision Foundation. Except for the watermark, it is identical to the IEEE published version. Available from IEEE via the DOI in this record.We present Large Scale Facial Model (LSFM) - a 3D Morphable Model (3DMM) automatically constructed from 9,663 distinct facial identities. To the best of our knowledge LSFM is the largest-scale Morphable Model ever constructed, containing statistical information from a huge variety of the human population. To build such a large model we introduce a novel fully automated and robust Morphable Model construction pipeline. The dataset that LSFM is trained on includes rich demographic information about each subject, allowing for the construction of not only a global 3DMM but also models tailored for specific age, gender or ethnicity groups. As an application example, we utilise the proposed model to perform age classification from 3D shape alone. Furthermore, we perform a systematic analysis of the constructed 3DMMs that showcases their quality and descriptive power. The presented extensive qualitative and quantitative evaluations reveal that the proposed 3DMM achieves state-of-the-art results, outperforming existing models by a large margin. Finally, for the benefit of the research community, we make publicly available the source code of the proposed automatic 3DMM construction pipeline. In addition, the constructed global 3DMM and a variety of bespoke models tailored by age, gender and ethnicity are available on application to researchers involved in medically oriented research.J. Booth is funded by an EPSRC DTA from Imperial College London, and holds a Qualcomm Innovation Fellowship. A. Roussos is funded by the Great Ormond Street Hospital Childrens Charity (Face Value: W1037). The work of S. Zafeiriou was partially funded by the EPSRC project EP/J017787/1 (4D-FAB)

Open Research Exeter

Weakly-supervised mesh-convolutional hand reconstruction in the wild

Author: Bronstein M
Güler RA
Kokkinos I
Kulon D
Zafeiriou S
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 04/04/2020
Field of study

We introduce a simple and effective network architecture for monocular 3D hand pose estimation consisting of an image encoder followed by a mesh convolutional decoder that is trained through a direct 3D hand mesh reconstruction loss. We train our network by gathering a large-scale dataset of hand action in YouTube videos and use it as a source of weak supervision. Our weakly-supervised mesh convolutions-based system largely outperforms state-of-the-art methods, even halving the errors on the in the wild benchmark. The dataset and additional resources are available at https://arielai.com/mesh_hands

arXiv.org e-Print Archive

Crossref

UCL Discovery

A 3D Morphable Model learnt from 10,000 faces

Author: Booth J
Dunaway D
Ponniah A
Roussos A
Zafeiriou S
Publication venue: Computer Vision Foundation (CVF)
Publication date: 02/03/2016
Field of study

We present Large Scale Facial Model (LSFM) — a 3D Morphable Model (3DMM) automatically constructed from 9,663 distinct facial identities. To the best of our knowledge LSFM is the largest-scale Morphable Model ever constructed, containing statistical information from a huge variety of the human population. To build such a large model we introduce a novel fully automated and robust Morphable Model construction pipeline. The dataset that LSFM is trained on includes rich demographic information about each subject, allowing for the construction of not only a global 3DMM but also models tailored for specific age, gender or ethnicity groups. As an application example, we utilise the proposed model to perform age classification from 3D shape alone. Furthermore, we perform a systematic analysis of the constructed 3DMMs that showcases their quality and descriptive power. The presented extensive qualitative and quantitative evaluations reveal that the proposed 3DMM achieves state-of-the-art results, outperforming existing models by a large margin. Finally, for the benefit of the research community, we make publicly available the source code of the proposed automatic 3DMM construction pipeline. In addition, the constructed global 3DMM and a variety of bespoke models tailored by age, gender and ethnicity are available on application to researchers involved in medically oriented research

Crossref

Spiral - Imperial College Digital Repository

Past tense in children with focal brain lesions

Author: C. Manouilidou
D. Zafeiriou
Marchman
P. Konstantinopoulou
S. Stavrakaki
Stavrakaki
Publication venue: 'Elsevier BV'
Publication date: 01/10/2013
Field of study

In this study, 22 children with early left hemisphere (LHD) or right hemisphere (RHD) focal brain lesions (FL, n ¼ 14 LHD, n ¼ 8 RHD) were administered an English past tense elicitation test (M ¼ 6:5 years). Proportion correct and frequency of overregularization and zero-marking errors were compared to age-matched samples of children with specific language impairment (SLI, n ¼ 27) and with typical language development (TD, n ¼ 27). Similar rates of correct production and error patterns were observed for the children with TD and FL; whereas, children with SLI produced more zero-marking errors than either their FL or TD peers. Performance was predicted by vocabulary level (PPVT-R) for children in all groups, and errors did not differ as a function of lesion side (LHD vs. RHD). Findings are discussed in terms of the nature of brain–language relations and how those relationships develop over the course of language learning

Elsevier - Publisher Connector

Crossref

Birkbeck Institutional Research Online

Aff-Wild: Valence and Arousal ‘in-the-wild’ Challenge

Author: Kollias D.
Kollias D.
Kotsia I.
Kotsia I.
Nicolaou M.
Nicolaou M.
Papaioannou A.
Papaioannou A.
Zafeiriou S.
Zafeiriou S.
Zhao G.
Zhao G.
Publication venue: Institute of Electrical and Electronics Engineers (IEEE)
Publication date: 01/01/2017
Field of study

The Affect-in-the-Wild (Aff-Wild) Challenge proposes a new comprehensive benchmark for assessing the performance of facial affect/behaviour analysis/understanding 'in-the-wild'. The Aff-wild benchmark contains about 300 videos (over 2,000 minutes of data) annotated with regards to valence and arousal, all captured 'in-the-wild' (the main source being Youtube videos). The paper presents the database description, the experimental set up, the baseline method used for the Challenge and finally the summary of the performance of the different methods submitted to the Affect-in-the-Wild Challenge for Valence and Arousal estimation. The challenge demonstrates that meticulously designed deep neural networks can achieve very good performance when trained with in-the-wild data

Middlesex University Research Repository

Grid Loss: Detecting Occluded Faces

Author: C Dubout
C Garcia
D Chen
H Rowley
J Yan
M Everingham
M Mathias
N Srivastava
P Dollár
P Viola
PF Felzenszwalb
R Vaillant
S Zafeiriou
Publication venue
Publication date: 01/09/2016
Field of study

Detection of partially occluded objects is a challenging computer vision problem. Standard Convolutional Neural Network (CNN) detectors fail if parts of the detection window are occluded, since not every sub-part of the window is discriminative on its own. To address this issue, we propose a novel loss layer for CNNs, named grid loss, which minimizes the error rate on sub-blocks of a convolution layer independently rather than over the whole feature map. This results in parts being more discriminative on their own, enabling the detector to recover if the detection window is partially occluded. By mapping our loss layer back to a regular fully connected layer, no additional computational cost is incurred at runtime compared to standard CNNs. We demonstrate our method for face detection on several public face detection benchmarks and show that our method outperforms regular CNNs, is suitable for realtime applications and achieves state-of-the-art performance.Comment: accepted to ECCV 201

arXiv.org e-Print Archive

Crossref

Deep affect prediction in-the-wild: Aff-wild database and challenge, deep architectures, and beyond

Author: Kollias D.
Kollias D.
Kotsia I.
Kotsia I.
Nicolaou M.
Nicolaou M.
Papaioannou A.
Papaioannou A.
Schuller B.
Schuller B.
Tzirakis P.
Tzirakis P.
Zafeiriou S.
Zafeiriou S.
Zhao G.
Zhao G.
Publication venue: Springer
Publication date: 01/01/2019
Field of study

Automatic understanding of human affect using visual signals is of great importance in everyday human–machine interac- tions. Appraising human emotional states, behaviors and reactions displayed in real-world settings, can be accomplished using latent continuous dimensions (e.g., the circumplex model of affect). Valence (i.e., how positive or negative is an emo- tion) and arousal (i.e., power of the activation of the emotion) constitute popular and effective representations for affect. Nevertheless, the majority of collected datasets this far, although containing naturalistic emotional states, have been captured in highly controlled recording conditions. In this paper, we introduce the Aff-Wild benchmark for training and evaluating affect recognition algorithms. We also report on the results of the First Affect-in-the-wild Challenge (Aff-Wild Challenge) that was recently organized in conjunction with CVPR 2017 on the Aff-Wild database, and was the first ever challenge on the estimation of valence and arousal in-the-wild. Furthermore, we design and extensively train an end-to-end deep neural architecture which performs prediction of continuous emotion dimensions based on visual cues. The proposed deep learning architecture, AffWildNet, includes convolutional and recurrent neural network layers, exploiting the invariant properties of convolutional features, while also modeling temporal dynamics that arise in human behavior via the recurrent layers. The AffWildNet produced state-of-the-art results on the Aff-Wild Challenge. We then exploit the AffWild database for learning features, which can be used as priors for achieving best performances both for dimensional, as well as categorical emo- tion recognition, using the RECOLA, AFEW-VA and EmotiW 2017 datasets, compared to all other methods designed for the same goal. The database and emotion recognition models are available at http://ibug.doc.ic.ac.uk/resources/first-affect-wild-challenge

Middlesex University Research Repository

Distribution Matching for Multi-Task Learning of Classification Tasks: a Large-Scale Study on Faces & Beyond

Author: 38th Annual AAAI Conference on Artificial Intelligence
Kollias D
Sharmanska V
Zafeiriou S
Publication venue: Association for the Advancement of Artificial Intelligence (AAAI)
Publication date: 24/03/2024
Field of study

Multi-Task Learning (MTL) is a framework, where multiple related tasks are learned jointly and benefit from a shared representation space, or parameter transfer. To provide sufficient learning support, modern MTL uses annotated data with full, or sufficiently large overlap across tasks, i.e., each input sample is annotated for all, or most of the tasks. However, collecting such annotations is prohibitive in many real applications, and cannot benefit from datasets available for individual tasks. In this work, we challenge this setup and show that MTL can be successful with classification tasks with little, or non-overlapping annotations, or when there is big discrepancy in the size of labeled data per task. We explore task-relatedness for co-annotation and co-training, and propose a novel approach, where knowledge exchange is enabled between the tasks via distribution matching. To demonstrate the general applicability of our method, we conducted diverse case studies in the domains of affective computing, face recognition, species recognition, and shopping item classification using nine datasets. Our large-scale study of affective tasks for basic expression recognition and facial action unit detection illustrates that our approach is network agnostic and brings large performance improvements compared to the state-of-the-art in both tasks and across all studied databases. In all case studies, we show that co-training via task-relatedness is advantageous and prevents negative transfer (which occurs when MT model's performance is worse than that of at least one single-task model)

Queen Mary Research Online