156 research outputs found
Geometric loss functions for camera pose regression with deep learning
Deep learning has shown to be effective for robust and real-time monocular
image relocalisation. In particular, PoseNet is a deep convolutional neural
network which learns to regress the 6-DOF camera pose from a single image. It
learns to localize using high level features and is robust to difficult
lighting, motion blur and unknown camera intrinsics, where point based SIFT
registration fails. However, it was trained using a naive loss function, with
hyper-parameters which require expensive tuning. In this paper, we give the
problem a more fundamental theoretical treatment. We explore a number of novel
loss functions for learning camera pose which are based on geometry and scene
reprojection error. Additionally we show how to automatically learn an optimal
weighting to simultaneously regress position and orientation. By leveraging
geometry, we demonstrate that our technique significantly improves PoseNet's
performance across datasets ranging from indoor rooms to a small city
Face Recognition from Sequential Sparse 3D Data via Deep Registration
Previous works have shown that face recognition with high accurate 3D data is
more reliable and insensitive to pose and illumination variations. Recently,
low-cost and portable 3D acquisition techniques like ToF(Time of Flight) and
DoE based structured light systems enable us to access 3D data easily, e.g.,
via a mobile phone. However, such devices only provide sparse(limited speckles
in structured light system) and noisy 3D data which can not support face
recognition directly. In this paper, we aim at achieving high-performance face
recognition for devices equipped with such modules which is very meaningful in
practice as such devices will be very popular. We propose a framework to
perform face recognition by fusing a sequence of low-quality 3D data. As 3D
data are sparse and noisy which can not be well handled by conventional methods
like the ICP algorithm, we design a PointNet-like Deep Registration
Network(DRNet) which works with ordered 3D point coordinates while preserving
the ability of mining local structures via convolution. Meanwhile we develop a
novel loss function to optimize our DRNet based on the quaternion expression
which obviously outperforms other widely used functions. For face recognition,
we design a deep convolutional network which takes the fused 3D depth-map as
input based on AMSoftmax model. Experiments show that our DRNet can achieve
rotation error 0.95{\deg} and translation error 0.28mm for registration. The
face recognition on fused data also achieves rank-1 accuracy 99.2% , FAR-0.001
97.5% on Bosphorus dataset which is comparable with state-of-the-art
high-quality data based recognition performance.Comment: To be appeared in ICB201
A comparison of uncertainty estimation approaches for DNN-based camera localization
Camera localization, i.e., camera pose regression, represents a very
important task in computer vision, since it has many practical applications,
such as autonomous driving. A reliable estimation of the uncertainties in
camera localization is also important, as it would allow to intercept
localization failures, which would be dangerous. Even though the literature
presents some uncertainty estimation methods, to the best of our knowledge
their effectiveness has not been thoroughly examined. This work compares the
performances of three consolidated epistemic uncertainty estimation methods:
Monte Carlo Dropout (MCD), Deep Ensemble (DE), and Deep Evidential Regression
(DER), in the specific context of camera localization. We exploited CMRNet, a
DNN approach for multi-modal image to LiDAR map registration, by modifying its
internal configuration to allow for an extensive experimental activity with the
three methods on the KITTI dataset. Particularly significant has been the
application of DER. We achieve accurate camera localization and a calibrated
uncertainty, to the point that some method can be used for detecting
localization failures
- …