15,293 research outputs found
Stereo Vision-based Semantic 3D Object and Ego-motion Tracking for Autonomous Driving
We propose a stereo vision-based approach for tracking the camera ego-motion
and 3D semantic objects in dynamic autonomous driving scenarios. Instead of
directly regressing the 3D bounding box using end-to-end approaches, we propose
to use the easy-to-labeled 2D detection and discrete viewpoint classification
together with a light-weight semantic inference method to obtain rough 3D
object measurements. Based on the object-aware-aided camera pose tracking which
is robust in dynamic environments, in combination with our novel dynamic object
bundle adjustment (BA) approach to fuse temporal sparse feature correspondences
and the semantic 3D measurement model, we obtain 3D object pose, velocity and
anchored dynamic point cloud estimation with instance accuracy and temporal
consistency. The performance of our proposed method is demonstrated in diverse
scenarios. Both the ego-motion estimation and object localization are compared
with the state-of-of-the-art solutions.Comment: 14 pages, 9 figures, eccv201
Occlusion-Aware Object Localization, Segmentation and Pose Estimation
We present a learning approach for localization and segmentation of objects
in an image in a manner that is robust to partial occlusion. Our algorithm
produces a bounding box around the full extent of the object and labels pixels
in the interior that belong to the object. Like existing segmentation aware
detection approaches, we learn an appearance model of the object and consider
regions that do not fit this model as potential occlusions. However, in
addition to the established use of pairwise potentials for encouraging local
consistency, we use higher order potentials which capture information at the
level of im- age segments. We also propose an efficient loss function that
targets both localization and segmentation performance. Our algorithm achieves
13.52% segmentation error and 0.81 area under the false-positive per image vs.
recall curve on average over the challenging CMU Kitchen Occlusion Dataset.
This is a 42.44% decrease in segmentation error and a 16.13% increase in
localization performance compared to the state-of-the-art. Finally, we show
that the visibility labelling produced by our algorithm can make full 3D pose
estimation from a single image robust to occlusion.Comment: British Machine Vision Conference 2015 (poster
G2L-Net: Global to Local Network for Real-time 6D Pose Estimation with Embedding Vector Features
In this paper, we propose a novel real-time 6D object pose estimation
framework, named G2L-Net. Our network operates on point clouds from RGB-D
detection in a divide-and-conquer fashion. Specifically, our network consists
of three steps. First, we extract the coarse object point cloud from the RGB-D
image by 2D detection. Second, we feed the coarse object point cloud to a
translation localization network to perform 3D segmentation and object
translation prediction. Third, via the predicted segmentation and translation,
we transfer the fine object point cloud into a local canonical coordinate, in
which we train a rotation localization network to estimate initial object
rotation. In the third step, we define point-wise embedding vector features to
capture viewpoint-aware information. To calculate more accurate rotation, we
adopt a rotation residual estimator to estimate the residual between initial
rotation and ground truth, which can boost initial pose estimation performance.
Our proposed G2L-Net is real-time despite the fact multiple steps are stacked
via the proposed coarse-to-fine framework. Extensive experiments on two
benchmark datasets show that G2L-Net achieves state-of-the-art performance in
terms of both accuracy and speed.Comment: 10 pages, 11 figures, accepted in CVPR 202
- …