2,444 research outputs found
Deep Regression for Face Alignment
In this paper, we present a deep regression approach for face alignment. The
deep architecture consists of a global layer and multi-stage local layers. We
apply the back-propagation algorithm with the dropout strategy to jointly
optimize the regression parameters. We show that the resulting deep regressor
gradually and evenly approaches the true facial landmarks stage by stage,
avoiding the tendency to yield over-strong early stage regressors while
over-weak later stage regressors. Experimental results show that our approach
achieves the state-of-the-ar
Joint Face Alignment and 3D Face Reconstruction with Application to Face Recognition
Face alignment and 3D face reconstruction are traditionally accomplished as
separated tasks. By exploring the strong correlation between 2D landmarks and
3D shapes, in contrast, we propose a joint face alignment and 3D face
reconstruction method to simultaneously solve these two problems for 2D face
images of arbitrary poses and expressions. This method, based on a summation
model of 3D faces and cascaded regression in 2D and 3D shape spaces,
iteratively and alternately applies two cascaded regressors, one for updating
2D landmarks and the other for 3D shape. The 3D shape and the landmarks are
correlated via a 3D-to-2D mapping matrix, which is updated in each iteration to
refine the location and visibility of 2D landmarks. Unlike existing methods,
the proposed method can fully automatically generate both
pose-and-expression-normalized (PEN) and expressive 3D faces and localize both
visible and invisible 2D landmarks. Based on the PEN 3D faces, we devise a
method to enhance face recognition accuracy across poses and expressions. Both
linear and nonlinear implementations of the proposed method are presented and
evaluated in this paper. Extensive experiments show that the proposed method
can achieve the state-of-the-art accuracy in both face alignment and 3D face
reconstruction, and benefit face recognition owing to its reconstructed PEN 3D
face.Comment: IEEE Transactions on Pattern Analysis and Machine Intelligence, Nov.
201
Pose-Invariant Face Alignment with a Single CNN
Face alignment has witnessed substantial progress in the last decade. One of
the recent focuses has been aligning a dense 3D face shape to face images with
large head poses. The dominant technology used is based on the cascade of
regressors, e.g., CNN, which has shown promising results. Nonetheless, the
cascade of CNNs suffers from several drawbacks, e.g., lack of end-to-end
training, hand-crafted features and slow training speed. To address these
issues, we propose a new layer, named visualization layer, that can be
integrated into the CNN architecture and enables joint optimization with
different loss functions. Extensive evaluation of the proposed method on
multiple datasets demonstrates state-of-the-art accuracy, while reducing the
training time by more than half compared to the typical cascade of CNNs. In
addition, we compare multiple CNN architectures with the visualization layer to
further demonstrate the advantage of its utilization
A fast online cascaded regression algorithm for face alignment
Traditional face alignment based on machine learning usually tracks the
localizations of facial landmarks employing a static model trained offline
where all of the training data is available in advance. When new training
samples arrive, the static model must be retrained from scratch, which is
excessively time-consuming and memory-consuming. In many real-time
applications, the training data is obtained one by one or batch by batch. It
results in that the static model limits its performance on sequential images
with extensive variations. Therefore, the most critical and challenging aspect
in this field is dynamically updating the tracker's models to enhance
predictive and generalization capabilities continuously. In order to address
this question, we develop a fast and accurate online learning algorithm for
face alignment. Particularly, we incorporate on-line sequential extreme
learning machine into a parallel cascaded regression framework, coined
incremental cascade regression(ICR). To the best of our knowledge, this is the
first incremental cascaded framework with the non-linear regressor. One main
advantage of ICR is that the tracker model can be fast updated in an
incremental way without the entire retraining process when a new input is
incoming. Experimental results demonstrate that the proposed ICR is more
accurate and efficient on still or sequential images compared with the recent
state-of-the-art cascade approaches. Furthermore, the incremental learning
proposed in this paper can update the trained model in real time
Regression-Based Image Alignment for General Object Categories
Gradient-descent methods have exhibited fast and reliable performance for
image alignment in the facial domain, but have largely been ignored by the
broader vision community. They require the image function be smooth and
(numerically) differentiable -- properties that hold for pixel-based
representations obeying natural image statistics, but not for more general
classes of non-linear feature transforms. We show that transforms such as Dense
SIFT can be incorporated into a Lucas Kanade alignment framework by predicting
descent directions via regression. This enables robust matching of instances
from general object categories whilst maintaining desirable properties of Lucas
Kanade such as the capacity to handle high-dimensional warp parametrizations
and a fast rate of convergence. We present alignment results on a number of
objects from ImageNet, and an extension of the method to unsupervised joint
alignment of objects from a corpus of images
BridgeNet: A Continuity-Aware Probabilistic Network for Age Estimation
Age estimation is an important yet very challenging problem in computer
vision. Existing methods for age estimation usually apply a divide-and-conquer
strategy to deal with heterogeneous data caused by the non-stationary aging
process. However, the facial aging process is also a continuous process, and
the continuity relationship between different components has not been
effectively exploited. In this paper, we propose BridgeNet for age estimation,
which aims to mine the continuous relation between age labels effectively. The
proposed BridgeNet consists of local regressors and gating networks. Local
regressors partition the data space into multiple overlapping subspaces to
tackle heterogeneous data and gating networks learn continuity aware weights
for the results of local regressors by employing the proposed bridge-tree
structure, which introduces bridge connections into tree models to enforce the
similarity between neighbor nodes. Moreover, these two components of BridgeNet
can be jointly learned in an end-to-end way. We show experimental results on
the MORPH II, FG-NET and Chalearn LAP 2015 datasets and find that BridgeNet
outperforms the state-of-the-art methods.Comment: CVPR 201
Robust Facial Landmark Localization Based on Texture and Pose Correlated Initialization
Robust facial landmark localization remains a challenging task when faces are
partially occluded. Recently, the cascaded pose regression has attracted
increasing attentions, due to it's superior performance in facial landmark
localization and occlusion detection. However, such an approach is sensitive to
initialization, where an improper initialization can severly degrade the
performance. In this paper, we propose a Robust Initialization for Cascaded
Pose Regression (RICPR) by providing texture and pose correlated initial shapes
for the testing face. By examining the correlation of local binary patterns
histograms between the testing face and the training faces, the shapes of the
training faces that are most correlated with the testing face are selected as
the texture correlated initialization. To make the initialization more robust
to various poses, we estimate the rough pose of the testing face according to
five fiducial landmarks located by multitask cascaded convolutional networks.
Then the pose correlated initial shapes are constructed by the mean face's
shape and the rough testing face pose. Finally, the texture correlated and the
pose correlated initial shapes are joined together as the robust
initialization. We evaluate RICPR on the challenging dataset of COFW. The
experimental results demonstrate that the proposed scheme achieves better
performances than the state-of-the-art methods in facial landmark localization
and occlusion detection
Joint Maximum Purity Forest with Application to Image Super-Resolution
In this paper, we propose a novel random-forest scheme, namely Joint Maximum
Purity Forest (JMPF), for classification, clustering, and regression tasks. In
the JMPF scheme, the original feature space is transformed into a compactly
pre-clustered feature space, via a trained rotation matrix. The rotation matrix
is obtained through an iterative quantization process, where the input data
belonging to different classes are clustered to the respective vertices of the
new feature space with maximum purity. In the new feature space, orthogonal
hyperplanes, which are employed at the split-nodes of decision trees in random
forests, can tackle the clustering problems effectively. We evaluated our
proposed method on public benchmark datasets for regression and classification
tasks, and experiments showed that JMPF remarkably outperforms other
state-of-the-art random-forest-based approaches. Furthermore, we applied JMPF
to image super-resolution, because the transformed, compact features are more
discriminative to the clustering-regression scheme. Experiment results on
several public benchmark datasets also showed that the JMPF-based image
super-resolution scheme is consistently superior to recent state-of-the-art
image super-resolution algorithms.Comment: 18 pages, 7 figure
A Convolution Tree with Deconvolution Branches: Exploiting Geometric Relationships for Single Shot Keypoint Detection
Recently, Deep Convolution Networks (DCNNs) have been applied to the task of
face alignment and have shown potential for learning improved feature
representations. Although deeper layers can capture abstract concepts like
pose, it is difficult to capture the geometric relationships among the
keypoints in DCNNs. In this paper, we propose a novel convolution-deconvolution
network for facial keypoint detection. Our model predicts the 2D locations of
the keypoints and their individual visibility along with 3D head pose, while
exploiting the spatial relationships among different keypoints. Different from
existing approaches of modeling these relationships, we propose learnable
transform functions which captures the relationships between keypoints at
feature level. However, due to extensive variations in pose, not all of these
relationships act at once, and hence we propose, a pose-based routing function
which implicitly models the active relationships. Both transform functions and
the routing function are implemented through convolutions in a multi-task
framework. Our approach presents a single-shot keypoint detection method,
making it different from many existing cascade regression-based methods. We
also show that learning these relationships significantly improve the accuracy
of keypoint detections for in-the-wild face images from challenging datasets
such as AFW and AFLW
Joint Voxel and Coordinate Regression for Accurate 3D Facial Landmark Localization
3D face shape is more expressive and viewpoint-consistent than its 2D
counterpart. However, 3D facial landmark localization in a single image is
challenging due to the ambiguous nature of landmarks under 3D perspective.
Existing approaches typically adopt a suboptimal two-step strategy, performing
2D landmark localization followed by depth estimation. In this paper, we
propose the Joint Voxel and Coordinate Regression (JVCR) method for 3D facial
landmark localization, addressing it more effectively in an end-to-end fashion.
First, a compact volumetric representation is proposed to encode the per-voxel
likelihood of positions being the 3D landmarks. The dimensionality of such a
representation is fixed regardless of the number of target landmarks, so that
the curse of dimensionality could be avoided. Then, a stacked hourglass network
is adopted to estimate the volumetric representation from coarse to fine,
followed by a 3D convolution network that takes the estimated volume as input
and regresses 3D coordinates of the face shape. In this way, the 3D structural
constraints between landmarks could be learned by the neural network in a more
efficient manner. Moreover, the proposed pipeline enables end-to-end training
and improves the robustness and accuracy of 3D facial landmark localization.
The effectiveness of our approach is validated on the 3DFAW and AFLW2000-3D
datasets. Experimental results show that the proposed method achieves
state-of-the-art performance in comparison with existing methods.Comment: Code available at https://github.com/HongwenZhang/JVCR-3Dlandmar
- …