11,807 research outputs found
Facial component-landmark detection with weakly-supervised LR-CNN
© 2013 IEEE. In this paper, we propose a weakly supervised landmark-region-based convolutional neural network (LR-CNN) framework to detect facial component and landmark simultaneously. Most of the existing course-to-fine facial detectors fail to detect landmark accurately without lots of fully labeled data, which are costly to obtain. We can handle the task with a small amount of finely labeled data. First, deep convolutional generative adversarial networks are utilized to generate training samples with weak labels, as data preparation. Then, through weakly supervised learning, our LR-CNN model can be trained effectively with a small amount of finely labeled data and a large amount of generated weakly labeled data. Notably, our approach can handle the situation when large occlusion areas occur, as we localize visible facial components before predicting corresponding landmarks. Detecting unblocked components first helps us to focus on the informative area, resulting in a better performance. Additionally, to improve the performance of the above tasks, we design two models as follows: 1) we add AnchorAlign in the region proposal networks to accurately localize components and 2) we propose a two-branch model consisting classification branch and regression branch to detect landmark. Extensive evaluations on benchmark datasets indicate that our proposed approach is able to complete the multi-task facial detection and outperforms the state-of-the-art facial component and landmark detection algorithms
Fast Landmark Localization with 3D Component Reconstruction and CNN for Cross-Pose Recognition
Two approaches are proposed for cross-pose face recognition, one is based on
the 3D reconstruction of facial components and the other is based on the deep
Convolutional Neural Network (CNN). Unlike most 3D approaches that consider
holistic faces, the proposed approach considers 3D facial components. It
segments a 2D gallery face into components, reconstructs the 3D surface for
each component, and recognizes a probe face by component features. The
segmentation is based on the landmarks located by a hierarchical algorithm that
combines the Faster R-CNN for face detection and the Reduced Tree Structured
Model for landmark localization. The core part of the CNN-based approach is a
revised VGG network. We study the performances with different settings on the
training set, including the synthesized data from 3D reconstruction, the
real-life data from an in-the-wild database, and both types of data combined.
We investigate the performances of the network when it is employed as a
classifier or designed as a feature extractor. The two recognition approaches
and the fast landmark localization are evaluated in extensive experiments, and
compared to stateof-the-art methods to demonstrate their efficacy.Comment: 14 pages, 12 figures, 4 table
Simultaneous Facial Landmark Detection, Pose and Deformation Estimation under Facial Occlusion
Facial landmark detection, head pose estimation, and facial deformation
analysis are typical facial behavior analysis tasks in computer vision. The
existing methods usually perform each task independently and sequentially,
ignoring their interactions. To tackle this problem, we propose a unified
framework for simultaneous facial landmark detection, head pose estimation, and
facial deformation analysis, and the proposed model is robust to facial
occlusion. Following a cascade procedure augmented with model-based head pose
estimation, we iteratively update the facial landmark locations, facial
occlusion, head pose and facial de- formation until convergence. The
experimental results on benchmark databases demonstrate the effectiveness of
the proposed method for simultaneous facial landmark detection, head pose and
facial deformation estimation, even if the images are under facial occlusion.Comment: International Conference on Computer Vision and Pattern Recognition,
201
A Comprehensive Performance Evaluation of Deformable Face Tracking "In-the-Wild"
Recently, technologies such as face detection, facial landmark localisation
and face recognition and verification have matured enough to provide effective
and efficient solutions for imagery captured under arbitrary conditions
(referred to as "in-the-wild"). This is partially attributed to the fact that
comprehensive "in-the-wild" benchmarks have been developed for face detection,
landmark localisation and recognition/verification. A very important technology
that has not been thoroughly evaluated yet is deformable face tracking
"in-the-wild". Until now, the performance has mainly been assessed
qualitatively by visually assessing the result of a deformable face tracking
technology on short videos. In this paper, we perform the first, to the best of
our knowledge, thorough evaluation of state-of-the-art deformable face tracking
pipelines using the recently introduced 300VW benchmark. We evaluate many
different architectures focusing mainly on the task of on-line deformable face
tracking. In particular, we compare the following general strategies: (a)
generic face detection plus generic facial landmark localisation, (b) generic
model free tracking plus generic facial landmark localisation, as well as (c)
hybrid approaches using state-of-the-art face detection, model free tracking
and facial landmark localisation technologies. Our evaluation reveals future
avenues for further research on the topic.Comment: E. Antonakos and P. Snape contributed equally and have joint second
authorshi
3D face tracking and multi-scale, spatio-temporal analysis of linguistically significant facial expressions and head positions in ASL
Essential grammatical information is conveyed in signed languages by clusters of events involving facial expressions and movements of the head and upper body. This poses a significant challenge for computer-based sign language recognition. Here, we present new methods for the recognition of nonmanual grammatical markers in American Sign Language (ASL) based on: (1) new 3D tracking methods for the estimation of 3D head pose and facial expressions to determine the relevant low-level features; (2) methods for higher-level analysis of component events (raised/lowered eyebrows, periodic head nods and head shakes) used in grammatical markings—with differentiation of temporal phases (onset, core, offset, where appropriate), analysis of their characteristic properties, and extraction of corresponding features; (3) a 2-level learning framework to combine lowand high-level features of differing spatio-temporal scales. This new approach achieves significantly better tracking and recognition results than our previous methods
- …