17,748 research outputs found
How Geometry Meets Learning in Pose Estimation
This thesis focuses on one of the fundamental problems in computer vision, sixdegree- of-freedom (6dof) pose estimation, whose task is to predict the geometric transformation from the camera to a target of interest, from only RGB inputs. Solutions to this problem have been proposed using the technique of image retrieval or sparse 2D-3D correspondence matching with geometric verification. Thanks to the development of deep learning, the direct regression-based (compute pose directly from image-to-pose regression) and indirect reconstruction-based (solve pose via dense matching between image and 3D reconstruction) approaches using neural network recently draw growing attention in community. Although models have been proposed for both camera relocalisation and object pose estimation using a deep network base, there are still open questions. In this thesis, we investigate several problems in pose estimation regarding end-to-end object pose inference, uncertainty of pose estimation in regression-based method and self-supervision for reconstruction-based learning both for scenes and objects. We focus on the end-to-end 6dof pose regression for objects in the first part of this thesis. Traditional methods that predict the 6dof pose for objects usually rely on the 3D CAD model and require a multi-step scheme to compute the pose. We alternatively use the idea of direct pose regression for objects based on a region proposed network Mask R-CNN, which is well-known for object detection and instance segmentation. Our newly proposed network head regresses a 4D vector from the RoI feature map of each object. A 3D vector from Lie algebra is used as the representation for rotation. Another one scalar for the z-axis of translation is predicted to recover the full 3D translation along with the position of bounding boxes. This simplification avoids the spatial ambiguity for object in the scope of 2D image caused by RoIPooling. Our method performs accurately at inference time, and faster than methods that require 3D models and refinement in their pipeline. We estimate the uncertainty for the pose regressed by a deep model in the second part. A CNN is combined with Gaussian Process Regression (GPR) to build a framework that directly obtains a predictive distribution over camera pose. The combination is achieved by exploiting the CNN to extract discriminative features and using the GPR to perform probabilistic inference. In order to prevent the complexity of uncertainty estimation from growing with the number of training images in the datasets, we use pseudo inducing CNN feature points to represent the whole dataset and learn their representations using Stochastic Variational Inference (SVI). This makes GPR a parametric model, which can be learnt together with the CNN backbone at the same time. We test the proposed hybrid framework on the problem of camera relocalisation. The third and fourth parts of our thesis have similar objectives: seeking selfsupervision for the learning of dense reconstruction for pose estimation from images without using the ground truth 3D model of scenes (in part 3) and objects (in part 4). We explore an alternative supervisory signal from multi-view geometry. Photometric and/or featuremetric consistency in image pairs from different viewpoints is proposed to constrain the learning of the world-centric coordinates (part 3) and object-centric coordinates (part 4). The dense reconstruction model is subsequently used as 2D-3D correspondences establisher at inference time to compute the 6dof pose using PnP plus RANSAC. Our 3D model free methods for pose estimation eliminate the dependency on 3D models used in state-of-the-art approaches.Thesis (Ph.D.) -- University of Adelaide, School of Computer Science, 202
Simultaneous Facial Landmark Detection, Pose and Deformation Estimation under Facial Occlusion
Facial landmark detection, head pose estimation, and facial deformation
analysis are typical facial behavior analysis tasks in computer vision. The
existing methods usually perform each task independently and sequentially,
ignoring their interactions. To tackle this problem, we propose a unified
framework for simultaneous facial landmark detection, head pose estimation, and
facial deformation analysis, and the proposed model is robust to facial
occlusion. Following a cascade procedure augmented with model-based head pose
estimation, we iteratively update the facial landmark locations, facial
occlusion, head pose and facial de- formation until convergence. The
experimental results on benchmark databases demonstrate the effectiveness of
the proposed method for simultaneous facial landmark detection, head pose and
facial deformation estimation, even if the images are under facial occlusion.Comment: International Conference on Computer Vision and Pattern Recognition,
201
Face Alignment Assisted by Head Pose Estimation
In this paper we propose a supervised initialization scheme for cascaded face
alignment based on explicit head pose estimation. We first investigate the
failure cases of most state of the art face alignment approaches and observe
that these failures often share one common global property, i.e. the head pose
variation is usually large. Inspired by this, we propose a deep convolutional
network model for reliable and accurate head pose estimation. Instead of using
a mean face shape, or randomly selected shapes for cascaded face alignment
initialisation, we propose two schemes for generating initialisation: the first
one relies on projecting a mean 3D face shape (represented by 3D facial
landmarks) onto 2D image under the estimated head pose; the second one searches
nearest neighbour shapes from the training set according to head pose distance.
By doing so, the initialisation gets closer to the actual shape, which enhances
the possibility of convergence and in turn improves the face alignment
performance. We demonstrate the proposed method on the benchmark 300W dataset
and show very competitive performance in both head pose estimation and face
alignment.Comment: Accepted by BMVC201
Fine-Grained Head Pose Estimation Without Keypoints
Estimating the head pose of a person is a crucial problem that has a large
amount of applications such as aiding in gaze estimation, modeling attention,
fitting 3D models to video and performing face alignment. Traditionally head
pose is computed by estimating some keypoints from the target face and solving
the 2D to 3D correspondence problem with a mean human head model. We argue that
this is a fragile method because it relies entirely on landmark detection
performance, the extraneous head model and an ad-hoc fitting step. We present
an elegant and robust way to determine pose by training a multi-loss
convolutional neural network on 300W-LP, a large synthetically expanded
dataset, to predict intrinsic Euler angles (yaw, pitch and roll) directly from
image intensities through joint binned pose classification and regression. We
present empirical tests on common in-the-wild pose benchmark datasets which
show state-of-the-art results. Additionally we test our method on a dataset
usually used for pose estimation using depth and start to close the gap with
state-of-the-art depth pose methods. We open-source our training and testing
code as well as release our pre-trained models.Comment: Accepted to Computer Vision and Pattern Recognition Workshops
(CVPRW), 2018 IEEE Conference on. IEEE, 201
Semantic Graph Convolutional Networks for 3D Human Pose Regression
In this paper, we study the problem of learning Graph Convolutional Networks
(GCNs) for regression. Current architectures of GCNs are limited to the small
receptive field of convolution filters and shared transformation matrix for
each node. To address these limitations, we propose Semantic Graph
Convolutional Networks (SemGCN), a novel neural network architecture that
operates on regression tasks with graph-structured data. SemGCN learns to
capture semantic information such as local and global node relationships, which
is not explicitly represented in the graph. These semantic relationships can be
learned through end-to-end training from the ground truth without additional
supervision or hand-crafted rules. We further investigate applying SemGCN to 3D
human pose regression. Our formulation is intuitive and sufficient since both
2D and 3D human poses can be represented as a structured graph encoding the
relationships between joints in the skeleton of a human body. We carry out
comprehensive studies to validate our method. The results prove that SemGCN
outperforms state of the art while using 90% fewer parameters.Comment: In CVPR 2019 (13 pages including supplementary material). The code
can be found at https://github.com/garyzhao/SemGC
- …