2 research outputs found
Face Alignment Robust to Pose, Expressions and Occlusions
We propose an Ensemble of Robust Constrained Local Models for alignment of
faces in the presence of significant occlusions and of any unknown pose and
expression. To account for partial occlusions we introduce, Robust Constrained
Local Models, that comprises of a deformable shape and local landmark
appearance model and reasons over binary occlusion labels. Our occlusion
reasoning proceeds by a hypothesize-and-test search over occlusion labels.
Hypotheses are generated by Constrained Local Model based shape fitting over
randomly sampled subsets of landmark detector responses and are evaluated by
the quality of face alignment. To span the entire range of facial pose and
expression variations we adopt an ensemble of independent Robust Constrained
Local Models to search over a discretized representation of pose and
expression. We perform extensive evaluation on a large number of face images,
both occluded and unoccluded. We find that our face alignment system trained
entirely on facial images captured "in-the-lab" exhibits a high degree of
generalization to facial images captured "in-the-wild". Our results are
accurate and stable over a wide spectrum of occlusions, pose and expression
variations resulting in excellent performance on many real-world face datasets
Disentangling 3D Pose in A Dendritic CNN for Unconstrained 2D Face Alignment
Heatmap regression has been used for landmark localization for quite a while
now. Most of the methods use a very deep stack of bottleneck modules for
heatmap classification stage, followed by heatmap regression to extract the
keypoints. In this paper, we present a single dendritic CNN, termed as Pose
Conditioned Dendritic Convolution Neural Network (PCD-CNN), where a
classification network is followed by a second and modular classification
network, trained in an end to end fashion to obtain accurate landmark points.
Following a Bayesian formulation, we disentangle the 3D pose of a face image
explicitly by conditioning the landmark estimation on pose, making it different
from multi-tasking approaches. Extensive experimentation shows that
conditioning on pose reduces the localization error by making it agnostic to
face pose. The proposed model can be extended to yield variable number of
landmark points and hence broadening its applicability to other datasets.
Instead of increasing depth or width of the network, we train the CNN
efficiently with Mask-Softmax Loss and hard sample mining to achieve upto
reduction in error compared to state-of-the-art methods for extreme and
medium pose face images from challenging datasets including AFLW, AFW, COFW and
IBUG.Comment: CVPR'1