Search CORE

1,092 research outputs found

Fast Landmark Localization with 3D Component Reconstruction and CNN for Cross-Pose Recognition

Author: Jorik Loef (4253974)
Lilian Vloet (3693295)
Margreet Hoogeveen (4253971)
Nico Tรถnjes (4253980)
Remco Ebben (4253968)
Renate Speijers (4253956)
Sivera Berben (4253965)
Thomas Pelgrim (4253959)
Publication venue
Publication date: 31/08/2017
Field of study

Two approaches are proposed for cross-pose face recognition, one is based on the 3D reconstruction of facial components and the other is based on the deep Convolutional Neural Network (CNN). Unlike most 3D approaches that consider holistic faces, the proposed approach considers 3D facial components. It segments a 2D gallery face into components, reconstructs the 3D surface for each component, and recognizes a probe face by component features. The segmentation is based on the landmarks located by a hierarchical algorithm that combines the Faster R-CNN for face detection and the Reduced Tree Structured Model for landmark localization. The core part of the CNN-based approach is a revised VGG network. We study the performances with different settings on the training set, including the synthesized data from 3D reconstruction, the real-life data from an in-the-wild database, and both types of data combined. We investigate the performances of the network when it is employed as a classifier or designed as a feature extractor. The two recognition approaches and the fast landmark localization are evaluated in extensive experiments, and compared to stateof-the-art methods to demonstrate their efficacy.Comment: 14 pages, 12 figures, 4 table

arXiv.org e-Print Archive

FigShare

Fine-Grained Head Pose Estimation Without Keypoints

Author: Chong Eunji
Rehg James M.
Ruiz Nataniel
Publication venue
Publication date: 13/04/2018
Field of study

Estimating the head pose of a person is a crucial problem that has a large amount of applications such as aiding in gaze estimation, modeling attention, fitting 3D models to video and performing face alignment. Traditionally head pose is computed by estimating some keypoints from the target face and solving the 2D to 3D correspondence problem with a mean human head model. We argue that this is a fragile method because it relies entirely on landmark detection performance, the extraneous head model and an ad-hoc fitting step. We present an elegant and robust way to determine pose by training a multi-loss convolutional neural network on 300W-LP, a large synthetically expanded dataset, to predict intrinsic Euler angles (yaw, pitch and roll) directly from image intensities through joint binned pose classification and regression. We present empirical tests on common in-the-wild pose benchmark datasets which show state-of-the-art results. Additionally we test our method on a dataset usually used for pose estimation using depth and start to close the gap with state-of-the-art depth pose methods. We open-source our training and testing code as well as release our pre-trained models.Comment: Accepted to Computer Vision and Pattern Recognition Workshops (CVPRW), 2018 IEEE Conference on. IEEE, 201

arXiv.org e-Print Archive

Crossref

Recommended from our members

Deep Structured Multi-Task Learning for Computer Vision in Autonomous Driving

Author: Teichmann Marvin
Publication venue: University of Cambridge
Publication date: 28/07/2020
Field of study

The field of computer vision is currently dominated by deep learning advances. Convolutional Neural Networks (CNNs) have become the predominant tool for solving almost any computer vision task, so state-of-the-art systems have been built by using the predictive capabilities of Convolutional Neural Networks (CNNs). Many of those systems use simple encoder–decoder based design, where an off-the-shelf CNN architecture is combined with a task-specific decoder and loss function in order to create an end-to-end trainable model. This ultimately raises the question of whether these kinds of models are the future of computer vision. In this thesis we argue that this is not the case. We start off by discussing three limitations of simple end-to-end training. We proceed by showing how it is possible to overcome those limitations by using an approach that we call structured modelling. The idea is to use CNNs to compute a rich semantic intermediate representation which is then used to solve the actual problem by applying a geometric and task-related structure. In this work we solve the localization, segmentation and landmark recognition task using structured modelling, and we show that this approach can improve generalization, interpretability and robustness. We also discuss how this approach is particularly useful for real-time applications such as autonomous driving. Visual perception is a multi-module problem that requires several different computer vision tasks to be solved. We discuss how, by sharing computations, we can improve not only the inference speed but also the prediction performance by using the structural relationship between the tasks. Lastly, we demonstrate that structured modelling is able to achieve state-of-the-art performance, making it a very relevant approach for solving current and future computer vision problems.Trinity College, ESPCR, Qualcom

Apollo (Cambridge)