53 research outputs found
Binarized convolutional landmark localizers for human pose estimation and face alignment with limited resources
Our goal is to design architectures that retain the groundbreaking performance of CNNs for landmark localization and at the same time are lightweight, compact and suitable for applications with limited computational resources. To this end, we make the following contributions: (a) we are the first to study the effect of neural network binarization on localization tasks, namely human pose estimation and face alignment. We exhaustively evaluate various design choices, identify performance bottlenecks, and more importantly propose multiple orthogonal ways to boost performance. (b) Based on our analysis, we propose a novel hierarchical, parallel and multi-scale residual architecture that yields large performance improvement over the standard bottleneck block while having the same number of parameters, thus bridging the gap between the original network and its binarized counterpart. (c) We perform a large number of ablation studies that shed light on the properties and the performance of the proposed block. (d) We present results for experiments on the most challenging datasets for human pose estimation and face alignment, reporting in many cases state-of-the-art performance. Code can be downloaded from https://www. adrianbulat.com/binary-cnn-landmark
Quantized Densely Connected U-Nets for Efficient Landmark Localization
In this paper, we propose quantized densely connected U-Nets for efficient
visual landmark localization. The idea is that features of the same semantic
meanings are globally reused across the stacked U-Nets. This dense connectivity
largely improves the information flow, yielding improved localization accuracy.
However, a vanilla dense design would suffer from critical efficiency issue in
both training and testing. To solve this problem, we first propose order-K
dense connectivity to trim off long-distance shortcuts; then, we use a
memory-efficient implementation to significantly boost the training efficiency
and investigate an iterative refinement that may slice the model size in half.
Finally, to reduce the memory consumption and high precision operations both in
training and testing, we further quantize weights, inputs, and gradients of our
localization network to low bit-width numbers. We validate our approach in two
tasks: human pose estimation and face alignment. The results show that our
approach achieves state-of-the-art localization accuracy, but using ~70% fewer
parameters, ~98% less model size and saving ~75% training memory compared with
other benchmark localizers. The code is available at
https://github.com/zhiqiangdon/CU-Net.Comment: ECCV201
Hierarchical binary CNNs for landmark localization with limited resources
Our goal is to design architectures that retain the groundbreaking performance of Convolutional Neural Networks (CNNs) for landmark localization and at the same time are lightweight, compact and suitable for applications with limited computational resources. To this end, we make the following contributions: (a) we are the first to study the effect of neural network binarization on localization tasks, namely human pose estimation and face alignment. We exhaustively evaluate various design choices, identify performance bottlenecks, and more importantly propose multiple orthogonal ways to boost performance. (b) Based on our analysis, we propose a novel hierarchical, parallel and multi-scale residual architecture that yields large performance improvement over the standard bottleneck block while having the same number of parameters, thus bridging the gap between the original network and its binarized counterpart. (c) We perform a large number of ablation studies that shed light on the properties and the performance of the proposed block. (d) We present results for experiments on the most challenging datasets for human pose estimation and face alignment, reporting in many cases state-of-the-art performance. (e) We further provide additional results for the problem of facial part segmentation. Code can be downloaded from https://www.adrianbulat.com/binary-cnn-landmark
Fine-Grained Head Pose Estimation Without Keypoints
Estimating the head pose of a person is a crucial problem that has a large
amount of applications such as aiding in gaze estimation, modeling attention,
fitting 3D models to video and performing face alignment. Traditionally head
pose is computed by estimating some keypoints from the target face and solving
the 2D to 3D correspondence problem with a mean human head model. We argue that
this is a fragile method because it relies entirely on landmark detection
performance, the extraneous head model and an ad-hoc fitting step. We present
an elegant and robust way to determine pose by training a multi-loss
convolutional neural network on 300W-LP, a large synthetically expanded
dataset, to predict intrinsic Euler angles (yaw, pitch and roll) directly from
image intensities through joint binned pose classification and regression. We
present empirical tests on common in-the-wild pose benchmark datasets which
show state-of-the-art results. Additionally we test our method on a dataset
usually used for pose estimation using depth and start to close the gap with
state-of-the-art depth pose methods. We open-source our training and testing
code as well as release our pre-trained models.Comment: Accepted to Computer Vision and Pattern Recognition Workshops
(CVPRW), 2018 IEEE Conference on. IEEE, 201
- …