8,114 research outputs found
Mono3D++: Monocular 3D Vehicle Detection with Two-Scale 3D Hypotheses and Task Priors
We present a method to infer 3D pose and shape of vehicles from a single
image. To tackle this ill-posed problem, we optimize two-scale projection
consistency between the generated 3D hypotheses and their 2D
pseudo-measurements. Specifically, we use a morphable wireframe model to
generate a fine-scaled representation of vehicle shape and pose. To reduce its
sensitivity to 2D landmarks, we jointly model the 3D bounding box as a coarse
representation which improves robustness. We also integrate three task priors,
including unsupervised monocular depth, a ground plane constraint as well as
vehicle shape priors, with forward projection errors into an overall energy
function.Comment: Proc. of the AAAI, September 201
Fast Single Shot Detection and Pose Estimation
For applications in navigation and robotics, estimating the 3D pose of
objects is as important as detection. Many approaches to pose estimation rely
on detecting or tracking parts or keypoints [11, 21]. In this paper we build on
a recent state-of-the-art convolutional network for slidingwindow detection
[10] to provide detection and rough pose estimation in a single shot, without
intermediate stages of detecting parts or initial bounding boxes. While not the
first system to treat pose estimation as a categorization problem, this is the
first attempt to combine detection and pose estimation at the same level using
a deep learning approach. The key to the architecture is a deep convolutional
network where scores for the presence of an object category, the offset for its
location, and the approximate pose are all estimated on a regular grid of
locations in the image. The resulting system is as accurate as recent work on
pose estimation (42.4% 8 View mAVP on Pascal 3D+ [21] ) and significantly
faster (46 frames per second (FPS) on a TITAN X GPU). This approach to
detection and rough pose estimation is fast and accurate enough to be widely
applied as a pre-processing step for tasks including high-accuracy pose
estimation, object tracking and localization, and vSLAM
3D Bounding Box Estimation Using Deep Learning and Geometry
We present a method for 3D object detection and pose estimation from a single
image. In contrast to current techniques that only regress the 3D orientation
of an object, our method first regresses relatively stable 3D object properties
using a deep convolutional neural network and then combines these estimates
with geometric constraints provided by a 2D object bounding box to produce a
complete 3D bounding box. The first network output estimates the 3D object
orientation using a novel hybrid discrete-continuous loss, which significantly
outperforms the L2 loss. The second output regresses the 3D object dimensions,
which have relatively little variance compared to alternatives and can often be
predicted for many object types. These estimates, combined with the geometric
constraints on translation imposed by the 2D bounding box, enable us to recover
a stable and accurate 3D object pose. We evaluate our method on the challenging
KITTI object detection benchmark both on the official metric of 3D orientation
estimation and also on the accuracy of the obtained 3D bounding boxes. Although
conceptually simple, our method outperforms more complex and computationally
expensive approaches that leverage semantic segmentation, instance level
segmentation and flat ground priors and sub-category detection. Our
discrete-continuous loss also produces state of the art results for 3D
viewpoint estimation on the Pascal 3D+ dataset.Comment: To appear in IEEE Conference on Computer Vision and Pattern
Recognition (CVPR) 201
- …