3,899 research outputs found
Learning a Disentangled Embedding for Monocular 3D Shape Retrieval and Pose Estimation
We propose a novel approach to jointly perform 3D shape retrieval and pose
estimation from monocular images.In order to make the method robust to
real-world image variations, e.g. complex textures and backgrounds, we learn an
embedding space from 3D data that only includes the relevant information,
namely the shape and pose. Our approach explicitly disentangles a shape vector
and a pose vector, which alleviates both pose bias for 3D shape retrieval and
categorical bias for pose estimation. We then train a CNN to map the images to
this embedding space, and then retrieve the closest 3D shape from the database
and estimate the 6D pose of the object. Our method achieves 10.3 median error
for pose estimation and 0.592 top-1-accuracy for category agnostic 3D object
retrieval on the Pascal3D+ dataset, outperforming the previous state-of-the-art
methods on both tasks
Real-time Monocular Object SLAM
We present a real-time object-based SLAM system that leverages the largest
object database to date. Our approach comprises two main components: 1) a
monocular SLAM algorithm that exploits object rigidity constraints to improve
the map and find its real scale, and 2) a novel object recognition algorithm
based on bags of binary words, which provides live detections with a database
of 500 3D objects. The two components work together and benefit each other: the
SLAM algorithm accumulates information from the observations of the objects,
anchors object features to especial map landmarks and sets constrains on the
optimization. At the same time, objects partially or fully located within the
map are used as a prior to guide the recognition algorithm, achieving higher
recall. We evaluate our proposal on five real environments showing improvements
on the accuracy of the map and efficiency with respect to other
state-of-the-art techniques
Scene Coordinate Regression with Angle-Based Reprojection Loss for Camera Relocalization
Image-based camera relocalization is an important problem in computer vision
and robotics. Recent works utilize convolutional neural networks (CNNs) to
regress for pixels in a query image their corresponding 3D world coordinates in
the scene. The final pose is then solved via a RANSAC-based optimization scheme
using the predicted coordinates. Usually, the CNN is trained with ground truth
scene coordinates, but it has also been shown that the network can discover 3D
scene geometry automatically by minimizing single-view reprojection loss.
However, due to the deficiencies of the reprojection loss, the network needs to
be carefully initialized. In this paper, we present a new angle-based
reprojection loss, which resolves the issues of the original reprojection loss.
With this new loss function, the network can be trained without careful
initialization, and the system achieves more accurate results. The new loss
also enables us to utilize available multi-view constraints, which further
improve performance.Comment: ECCV 2018 Workshop (Geometry Meets Deep Learning
- …