6,546 research outputs found
Machine Learning techniques applied to stereo vision
Stereo is a popular technique enabling fast and dense depth estimation from two or more images.
Its success is mainly due to its easiness of deployment, requiring only a couple or multiple synchronized image sensors, accurately calibrated to solve the matching problem between pixels on one of the images (named reference) and the other (named target). The absence of active technologies (e.g. pattern projection, laser scanners etc..) make this solution deployable on almost every scenario. Despite the wide literature concerning stereo, it still represents an open problem because of very challenging conditions such as poor illumination, reflective surfaces, occlusions and other elements occurring in real environments.
Two main trends in stereo vision acquired popularity in the last years: confidence estimation and machine learning. Both proved to be very effective, pushing forward the state-of-the-art of dense disparity estimation.
In this thesis, we combine these two trends to improve both confidence estimation and disparity inference, by defining more effective and easier to deploy confidence measures and proposing new approaches to leverage on them for more accurate depth prediction.
All the experiments are validated on three popular datasets, KITTI 2012, KITTI 2015 and Middlebury v3, following the commonly adopted methodologies and protocol to compare our proposals with previous works representing the state-of-the-art in stereo vision
On the Synergies between Machine Learning and Binocular Stereo for Depth Estimation from Images: a Survey
Stereo matching is one of the longest-standing problems in computer vision
with close to 40 years of studies and research. Throughout the years the
paradigm has shifted from local, pixel-level decision to various forms of
discrete and continuous optimization to data-driven, learning-based methods.
Recently, the rise of machine learning and the rapid proliferation of deep
learning enhanced stereo matching with new exciting trends and applications
unthinkable until a few years ago. Interestingly, the relationship between
these two worlds is two-way. While machine, and especially deep, learning
advanced the state-of-the-art in stereo matching, stereo itself enabled new
ground-breaking methodologies such as self-supervised monocular depth
estimation based on deep networks. In this paper, we review recent research in
the field of learning-based depth estimation from single and binocular images
highlighting the synergies, the successes achieved so far and the open
challenges the community is going to face in the immediate future.Comment: Accepted to TPAMI. Paper version of our CVPR 2019 tutorial:
"Learning-based depth estimation from stereo and monocular images: successes,
limitations and future challenges"
(https://sites.google.com/view/cvpr-2019-depth-from-image/home
Non-iterative RGB-D-inertial Odometry
This paper presents a non-iterative solution to RGB-D-inertial odometry
system. Traditional odometry methods resort to iterative algorithms which are
usually computationally expensive or require well-designed initialization. To
overcome this problem, this paper proposes to combine a non-iterative front-end
(odometry) with an iterative back-end (loop closure) for the RGB-D-inertial
SLAM system. The main contribution lies in the novel non-iterative front-end,
which leverages on inertial fusion and kernel cross-correlators (KCC) to match
point clouds in frequency domain. Dominated by the fast Fourier transform
(FFT), our method is only of complexity , where is
the number of points. Map fusion is conducted by element-wise operations, so
that both time and space complexity are further reduced. Extensive experiments
show that, due to the lightweight of the proposed front-end, the framework is
able to run at a much faster speed yet still with comparable accuracy with the
state-of-the-arts
FusionRCNN: LiDAR-Camera Fusion for Two-stage 3D Object Detection
3D object detection with multi-sensors is essential for an accurate and
reliable perception system of autonomous driving and robotics. Existing 3D
detectors significantly improve the accuracy by adopting a two-stage paradigm
which merely relies on LiDAR point clouds for 3D proposal refinement. Though
impressive, the sparsity of point clouds, especially for the points far away,
making it difficult for the LiDAR-only refinement module to accurately
recognize and locate objects.To address this problem, we propose a novel
multi-modality two-stage approach named FusionRCNN, which effectively and
efficiently fuses point clouds and camera images in the Regions of
Interest(RoI). FusionRCNN adaptively integrates both sparse geometry
information from LiDAR and dense texture information from camera in a unified
attention mechanism. Specifically, it first utilizes RoIPooling to obtain an
image set with a unified size and gets the point set by sampling raw points
within proposals in the RoI extraction step; then leverages an intra-modality
self-attention to enhance the domain-specific features, following by a
well-designed cross-attention to fuse the information from two
modalities.FusionRCNN is fundamentally plug-and-play and supports different
one-stage methods with almost no architectural changes. Extensive experiments
on KITTI and Waymo benchmarks demonstrate that our method significantly boosts
the performances of popular detectors.Remarkably, FusionRCNN significantly
improves the strong SECOND baseline by 6.14% mAP on Waymo, and outperforms
competing two-stage approaches. Code will be released soon at
https://github.com/xxlbigbrother/Fusion-RCNN.Comment: 7 pages, 3 figure
- …