24 research outputs found
The 3D Menpo Facial Landmark Tracking Challenge
This is the final version of the article. It is the open access version, provided by the Computer Vision Foundation. Except for the watermark, it is identical to the IEEE published version. Available from IEEE via the DOI in this record.Test descriptionRecently, deformable face alignment is synonymous to the task of locating a set of 2D sparse landmarks in intensity images. Currently, discriminatively trained Deep Convolutional Neural Networks (DCNNs) are the state-of-the-art in the task of face alignment. DCNNs exploit large amount of high quality annotations that emerged the last few years. Nevertheless, the provided 2D annotations rarely capture the 3D structure of the face (this is especially evident in the facial boundary). That is, the annotations neither provide an estimate of the depth nor correspond to the 2D projections of the 3D facial structure. This paper summarises our efforts to develop (a) a very large database suitable to be used to train 3D face alignment algorithms in images captured "in-the-wild" and (b) to train and evaluate new methods for 3D face landmark tracking. Finally, we report the results of the first challenge in 3D face tracking "in-the-wild".The work of S. Zafeiriou and A. Roussos has been partially funded by the EPSRC Project EP/N007743/
Fiducial Focus Augmentation for Facial Landmark Detection
Deep learning methods have led to significant improvements in the performance
on the facial landmark detection (FLD) task. However, detecting landmarks in
challenging settings, such as head pose changes, exaggerated expressions, or
uneven illumination, continue to remain a challenge due to high variability and
insufficient samples. This inadequacy can be attributed to the model's
inability to effectively acquire appropriate facial structure information from
the input images. To address this, we propose a novel image augmentation
technique specifically designed for the FLD task to enhance the model's
understanding of facial structures. To effectively utilize the newly proposed
augmentation technique, we employ a Siamese architecture-based training
mechanism with a Deep Canonical Correlation Analysis (DCCA)-based loss to
achieve collective learning of high-level feature representations from two
different views of the input images. Furthermore, we employ a Transformer +
CNN-based network with a custom hourglass module as the robust backbone for the
Siamese framework. Extensive experiments show that our approach outperforms
multiple state-of-the-art approaches across various benchmark datasets.Comment: Accepted to BMVC'2