4,737 research outputs found
Semantic 3D Occupancy Mapping through Efficient High Order CRFs
Semantic 3D mapping can be used for many applications such as robot
navigation and virtual interaction. In recent years, there has been great
progress in semantic segmentation and geometric 3D mapping. However, it is
still challenging to combine these two tasks for accurate and large-scale
semantic mapping from images. In the paper, we propose an incremental and
(near) real-time semantic mapping system. A 3D scrolling occupancy grid map is
built to represent the world, which is memory and computationally efficient and
bounded for large scale environments. We utilize the CNN segmentation as prior
prediction and further optimize 3D grid labels through a novel CRF model.
Superpixels are utilized to enforce smoothness and form robust P N high order
potential. An efficient mean field inference is developed for the graph
optimization. We evaluate our system on the KITTI dataset and improve the
segmentation accuracy by 10% over existing systems.Comment: IROS 201
FrameNet: Learning Local Canonical Frames of 3D Surfaces from a Single RGB Image
In this work, we introduce the novel problem of identifying dense canonical
3D coordinate frames from a single RGB image. We observe that each pixel in an
image corresponds to a surface in the underlying 3D geometry, where a canonical
frame can be identified as represented by three orthogonal axes, one along its
normal direction and two in its tangent plane. We propose an algorithm to
predict these axes from RGB. Our first insight is that canonical frames
computed automatically with recently introduced direction field synthesis
methods can provide training data for the task. Our second insight is that
networks designed for surface normal prediction provide better results when
trained jointly to predict canonical frames, and even better when trained to
also predict 2D projections of canonical frames. We conjecture this is because
projections of canonical tangent directions often align with local gradients in
images, and because those directions are tightly linked to 3D canonical frames
through projective geometry and orthogonality constraints. In our experiments,
we find that our method predicts 3D canonical frames that can be used in
applications ranging from surface normal estimation, feature matching, and
augmented reality
Occlusion-Aware Depth Estimation with Adaptive Normal Constraints
We present a new learning-based method for multi-frame depth estimation from
a color video, which is a fundamental problem in scene understanding, robot
navigation or handheld 3D reconstruction. While recent learning-based methods
estimate depth at high accuracy, 3D point clouds exported from their depth maps
often fail to preserve important geometric feature (e.g., corners, edges,
planes) of man-made scenes. Widely-used pixel-wise depth errors do not
specifically penalize inconsistency on these features. These inaccuracies are
particularly severe when subsequent depth reconstructions are accumulated in an
attempt to scan a full environment with man-made objects with this kind of
features. Our depth estimation algorithm therefore introduces a Combined Normal
Map (CNM) constraint, which is designed to better preserve high-curvature
features and global planar regions. In order to further improve the depth
estimation accuracy, we introduce a new occlusion-aware strategy that
aggregates initial depth predictions from multiple adjacent views into one
final depth map and one occlusion probability map for the current reference
view. Our method outperforms the state-of-the-art in terms of depth estimation
accuracy, and preserves essential geometric features of man-made indoor scenes
much better than other algorithms.Comment: ECCV 202
GANerated Hands for Real-time 3D Hand Tracking from Monocular RGB
We address the highly challenging problem of real-time 3D hand tracking based
on a monocular RGB-only sequence. Our tracking method combines a convolutional
neural network with a kinematic 3D hand model, such that it generalizes well to
unseen data, is robust to occlusions and varying camera viewpoints, and leads
to anatomically plausible as well as temporally smooth hand motions. For
training our CNN we propose a novel approach for the synthetic generation of
training data that is based on a geometrically consistent image-to-image
translation network. To be more specific, we use a neural network that
translates synthetic images to "real" images, such that the so-generated images
follow the same statistical distribution as real-world hand images. For
training this translation network we combine an adversarial loss and a
cycle-consistency loss with a geometric consistency loss in order to preserve
geometric properties (such as hand pose) during translation. We demonstrate
that our hand tracking system outperforms the current state-of-the-art on
challenging RGB-only footage
- …