11,265 research outputs found
Probabilistic Global Scale Estimation for MonoSLAM Based on Generic Object Detection
This paper proposes a novel method to estimate the global scale of a 3D
reconstructed model within a Kalman filtering-based monocular SLAM algorithm.
Our Bayesian framework integrates height priors over the detected objects
belonging to a set of broad predefined classes, based on recent advances in
fast generic object detection. Each observation is produced on single frames,
so that we do not need a data association process along video frames. This is
because we associate the height priors with the image region sizes at image
places where map features projections fall within the object detection regions.
We present very promising results of this approach obtained on several
experiments with different object classes.Comment: Int. Workshop on Visual Odometry, CVPR, (July 2017
Recurrent Scene Parsing with Perspective Understanding in the Loop
Objects may appear at arbitrary scales in perspective images of a scene,
posing a challenge for recognition systems that process images at a fixed
resolution. We propose a depth-aware gating module that adaptively selects the
pooling field size in a convolutional network architecture according to the
object scale (inversely proportional to the depth) so that small details are
preserved for distant objects while larger receptive fields are used for those
nearby. The depth gating signal is provided by stereo disparity or estimated
directly from monocular input. We integrate this depth-aware gating into a
recurrent convolutional neural network to perform semantic segmentation. Our
recurrent module iteratively refines the segmentation results, leveraging the
depth and semantic predictions from the previous iterations.
Through extensive experiments on four popular large-scale RGB-D datasets, we
demonstrate this approach achieves competitive semantic segmentation
performance with a model which is substantially more compact. We carry out
extensive analysis of this architecture including variants that operate on
monocular RGB but use depth as side-information during training, unsupervised
gating as a generic attentional mechanism, and multi-resolution gating. We find
that gated pooling for joint semantic segmentation and depth yields
state-of-the-art results for quantitative monocular depth estimation
J-MOD: Joint Monocular Obstacle Detection and Depth Estimation
In this work, we propose an end-to-end deep architecture that jointly learns
to detect obstacles and estimate their depth for MAV flight applications. Most
of the existing approaches either rely on Visual SLAM systems or on depth
estimation models to build 3D maps and detect obstacles. However, for the task
of avoiding obstacles this level of complexity is not required. Recent works
have proposed multi task architectures to both perform scene understanding and
depth estimation. We follow their track and propose a specific architecture to
jointly estimate depth and obstacles, without the need to compute a global map,
but maintaining compatibility with a global SLAM system if needed. The network
architecture is devised to exploit the joint information of the obstacle
detection task, that produces more reliable bounding boxes, with the depth
estimation one, increasing the robustness of both to scenario changes. We call
this architecture J-MOD. We test the effectiveness of our approach with
experiments on sequences with different appearance and focal lengths and
compare it to SotA multi task methods that jointly perform semantic
segmentation and depth estimation. In addition, we show the integration in a
full system using a set of simulated navigation experiments where a MAV
explores an unknown scenario and plans safe trajectories by using our detection
model
A Survey on Joint Object Detection and Pose Estimation using Monocular Vision
In this survey we present a complete landscape of joint object detection and
pose estimation methods that use monocular vision. Descriptions of traditional
approaches that involve descriptors or models and various estimation methods
have been provided. These descriptors or models include chordiograms,
shape-aware deformable parts model, bag of boundaries, distance transform
templates, natural 3D markers and facet features whereas the estimation methods
include iterative clustering estimation, probabilistic networks and iterative
genetic matching. Hybrid approaches that use handcrafted feature extraction
followed by estimation by deep learning methods have been outlined. We have
investigated and compared, wherever possible, pure deep learning based
approaches (single stage and multi stage) for this problem. Comprehensive
details of the various accuracy measures and metrics have been illustrated. For
the purpose of giving a clear overview, the characteristics of relevant
datasets are discussed. The trends that prevailed from the infancy of this
problem until now have also been highlighted.Comment: Accepted at the International Joint Conference on Computer Vision and
Pattern Recognition (CCVPR) 201
- …