1,165 research outputs found
Fine-Grained Head Pose Estimation Without Keypoints
Estimating the head pose of a person is a crucial problem that has a large
amount of applications such as aiding in gaze estimation, modeling attention,
fitting 3D models to video and performing face alignment. Traditionally head
pose is computed by estimating some keypoints from the target face and solving
the 2D to 3D correspondence problem with a mean human head model. We argue that
this is a fragile method because it relies entirely on landmark detection
performance, the extraneous head model and an ad-hoc fitting step. We present
an elegant and robust way to determine pose by training a multi-loss
convolutional neural network on 300W-LP, a large synthetically expanded
dataset, to predict intrinsic Euler angles (yaw, pitch and roll) directly from
image intensities through joint binned pose classification and regression. We
present empirical tests on common in-the-wild pose benchmark datasets which
show state-of-the-art results. Additionally we test our method on a dataset
usually used for pose estimation using depth and start to close the gap with
state-of-the-art depth pose methods. We open-source our training and testing
code as well as release our pre-trained models.Comment: Accepted to Computer Vision and Pattern Recognition Workshops
(CVPRW), 2018 IEEE Conference on. IEEE, 201
WarpNet: Weakly Supervised Matching for Single-view Reconstruction
We present an approach to matching images of objects in fine-grained datasets
without using part annotations, with an application to the challenging problem
of weakly supervised single-view reconstruction. This is in contrast to prior
works that require part annotations, since matching objects across class and
pose variations is challenging with appearance features alone. We overcome this
challenge through a novel deep learning architecture, WarpNet, that aligns an
object in one image with a different object in another. We exploit the
structure of the fine-grained dataset to create artificial data for training
this network in an unsupervised-discriminative learning approach. The output of
the network acts as a spatial prior that allows generalization at test time to
match real images across variations in appearance, viewpoint and articulation.
On the CUB-200-2011 dataset of bird categories, we improve the AP over an
appearance-only network by 13.6%. We further demonstrate that our WarpNet
matches, together with the structure of fine-grained datasets, allow
single-view reconstructions with quality comparable to using annotated point
correspondences.Comment: to appear in IEEE Conference on Computer Vision and Pattern
Recognition (CVPR) 201
- …