2,211 research outputs found
Lifting GIS Maps into Strong Geometric Context for Scene Understanding
Contextual information can have a substantial impact on the performance of
visual tasks such as semantic segmentation, object detection, and geometric
estimation. Data stored in Geographic Information Systems (GIS) offers a rich
source of contextual information that has been largely untapped by computer
vision. We propose to leverage such information for scene understanding by
combining GIS resources with large sets of unorganized photographs using
Structure from Motion (SfM) techniques. We present a pipeline to quickly
generate strong 3D geometric priors from 2D GIS data using SfM models aligned
with minimal user input. Given an image resectioned against this model, we
generate robust predictions of depth, surface normals, and semantic labels. We
show that the precision of the predicted geometry is substantially more
accurate other single-image depth estimation methods. We then demonstrate the
utility of these contextual constraints for re-scoring pedestrian detections,
and use these GIS contextual features alongside object detection score maps to
improve a CRF-based semantic segmentation framework, boosting accuracy over
baseline models
Real-Time Seamless Single Shot 6D Object Pose Prediction
We propose a single-shot approach for simultaneously detecting an object in
an RGB image and predicting its 6D pose without requiring multiple stages or
having to examine multiple hypotheses. Unlike a recently proposed single-shot
technique for this task (Kehl et al., ICCV'17) that only predicts an
approximate 6D pose that must then be refined, ours is accurate enough not to
require additional post-processing. As a result, it is much faster - 50 fps on
a Titan X (Pascal) GPU - and more suitable for real-time processing. The key
component of our method is a new CNN architecture inspired by the YOLO network
design that directly predicts the 2D image locations of the projected vertices
of the object's 3D bounding box. The object's 6D pose is then estimated using a
PnP algorithm.
For single object and multiple object pose estimation on the LINEMOD and
OCCLUSION datasets, our approach substantially outperforms other recent
CNN-based approaches when they are all used without post-processing. During
post-processing, a pose refinement step can be used to boost the accuracy of
the existing methods, but at 10 fps or less, they are much slower than our
method.Comment: CVPR 201
- …