56,563 research outputs found

    Deep face tracking and parsing in the wild

    Get PDF
    Face analysis has been a long-standing research direction in the field of computer vision and pattern recognition. A complete face analysis system involves solving several tasks including face detection, face tracking, face parsing, and face recognition. Recently, the performance of methods in all tasks has significantly improved thanks to the employment of Deep Convolutional Neural Networks (DCNNs). However, existing face analysis algorithms mainly focus on solving facial images captured in the constrained laboratory environment, and their performance on real-world images has remained less explored. Compared with the lab environment, the in-the-wild settings involve greater diversity in face sizes, poses, facial expressions, background clutters, lighting conditions and imaging quality. This thesis investigates two fundamental tasks in face analysis under in-the-wild settings: face tracking and face parsing. Both tasks serve as important prerequisites for downstream face analysis applications. However, in-the-wild datasets remain scarce in both fields and models have not been rigorously evaluated in such settings. In this thesis, we aim to bridge that gap of lacking in-the-wild data, evaluate existing methods in these settings, and develop accurate, robust and efficient deep learning-based methods for the two tasks. For face tracking in the wild, we introduce the first in-the-wild face tracking dataset, MobiFace, that consists of 80 videos captured by mobile phones during mobile live-streaming. The environment of the live-streaming performance is fully unconstrained and the interactions between users and mobile phones are natural and spontaneous. Next, we evaluate existing tracking methods, including generic object trackers and dedicated face trackers. The results show that MobiFace represent unique challenges in face tracking in the wild and cannot be readily solved by existing methods. Finally, we present a DCNN-based framework, FT-RCNN, that significantly outperforms other methods in face tracking in the wild. For face parsing in the wild, we introduce the first large-scale in-the-wild face dataset, iBugMask, that contains 21, 866 training images and 1, 000 testing images. Unlike existing datasets, the images in iBugMask are captured in the fully unconstrained environment and are not cropped or preprocessed of any kind. Manually annotated per-pixel labels for eleven facial regions are provided for each target face. Next, we benchmark existing parsing methods and the results show that iBugMask is extremely challenging for all methods. By rigorous benchmarking, we observe that the pre-processing of facial images with bounding boxes in face parsing in the wild introduces bias. When cropping the face with a bounding box, a cropping margin has to be hand-picked. If face alignment is used, fiducial landmarks are required and a predefined alignment template has to be selected. These additional hyper-parameters have to be carefully considered and can have a significant impact on the face parsing performance. To solve this, we propose Region-of-Interest (RoI) Tanh-polar transform that warps the whole image to a fixed-sized representation. Moreover, the RoI Tanh-polar transform is differentiable and allows for rotation equivariance in 1 DCNNs. We show that when coupled with a simple Fully Convolutional Network, our RoI Tanh-polar transformer Network has achieved state-of-the-art results on face parsing in the wild. This thesis contributes towards in-the-wild face tracking and face parsing by providing novel datasets and proposing effective frameworks. Both tasks can benefit real-world downstream applications such as facial age estimation, facial expression recognition and lip-reading. The proposed RoI Tanh-polar transform also provides a new perspective in how to preprocess the face images and make the DCNNs truly end-to-end for real-world face analysis applications.Open Acces

    Face Alignment Assisted by Head Pose Estimation

    Full text link
    In this paper we propose a supervised initialization scheme for cascaded face alignment based on explicit head pose estimation. We first investigate the failure cases of most state of the art face alignment approaches and observe that these failures often share one common global property, i.e. the head pose variation is usually large. Inspired by this, we propose a deep convolutional network model for reliable and accurate head pose estimation. Instead of using a mean face shape, or randomly selected shapes for cascaded face alignment initialisation, we propose two schemes for generating initialisation: the first one relies on projecting a mean 3D face shape (represented by 3D facial landmarks) onto 2D image under the estimated head pose; the second one searches nearest neighbour shapes from the training set according to head pose distance. By doing so, the initialisation gets closer to the actual shape, which enhances the possibility of convergence and in turn improves the face alignment performance. We demonstrate the proposed method on the benchmark 300W dataset and show very competitive performance in both head pose estimation and face alignment.Comment: Accepted by BMVC201
    • …
    corecore