130 research outputs found
A Critical Review of Deep Learning-Based Multi-Sensor Fusion Techniques
In this review, we provide a detailed coverage of multi-sensor fusion techniques that use RGB stereo images and a sparse LiDAR-projected depth map as input data to output a dense depth map prediction. We cover state-of-the-art fusion techniques which, in recent years, have been deep learning-based methods that are end-to-end trainable. We then conduct a comparative evaluation of the state-of-the-art techniques and provide a detailed analysis of their strengths and limitations as well as the applications they are best suited for
On the Synergies between Machine Learning and Binocular Stereo for Depth Estimation from Images: a Survey
Stereo matching is one of the longest-standing problems in computer vision
with close to 40 years of studies and research. Throughout the years the
paradigm has shifted from local, pixel-level decision to various forms of
discrete and continuous optimization to data-driven, learning-based methods.
Recently, the rise of machine learning and the rapid proliferation of deep
learning enhanced stereo matching with new exciting trends and applications
unthinkable until a few years ago. Interestingly, the relationship between
these two worlds is two-way. While machine, and especially deep, learning
advanced the state-of-the-art in stereo matching, stereo itself enabled new
ground-breaking methodologies such as self-supervised monocular depth
estimation based on deep networks. In this paper, we review recent research in
the field of learning-based depth estimation from single and binocular images
highlighting the synergies, the successes achieved so far and the open
challenges the community is going to face in the immediate future.Comment: Accepted to TPAMI. Paper version of our CVPR 2019 tutorial:
"Learning-based depth estimation from stereo and monocular images: successes,
limitations and future challenges"
(https://sites.google.com/view/cvpr-2019-depth-from-image/home
Vision-based Self-Supervised Depth Perception and Motion Control for Mobile Robots
The advances in robotics have enabled many different opportunities to deploy a mobile robot in various settings. However, many current mobile robots are equipped with a sensor suite with multiple types of sensors. This expensive sensor suite and the computationally complex program to fully utilize these sensors may limit the large-scale deployment of these robots. The recent development of computer vision has enabled the possibility to complete various robotic tasks with simply camera systems. This thesis focuses on two problems related to vision-based mobile robots: depth perception and motion control.
Commercially available stereo cameras relying on traditional stereo matching algorithms are widely used in robotic applications to obtain depth information. Although their raw (predicted) disparity maps may contain incorrect estimates, they can still provide useful prior information towards more accurate predictions. We propose a data-driven pipeline to incorporate the raw disparity to predict high-quality disparity maps. The pipeline first utilizes a confidence generation component to identify raw disparity inaccuracies. Then a deep neural network, which consists of a feature extraction module, a confidence guided raw disparity fusion module, and a hierarchical occlusion-aware disparity refinement module, computes the final disparity estimates and their corresponding occlusion masks. The pipeline can be trained in a self-supervised manner, removing the need of expensive ground truth training labels. Experimental results on public datasets show that the pipeline has competitive accuracy with real-time processing rate. The pipeline is also tested with images captured by commercial stereo cameras to demonstrate its effectiveness in improving their raw disparity estimates.
After the stereo matching pipeline predicts the disparity maps, they are used by a proposed disparity-based direct visual servoing controller to compute the commanded velocity to move a mobile robot towards its target pose. Many previous visual servoing methods rely on complex and error-prone feature extraction and matching steps. The proposed visual servoing framework follows the direct visual servoing approach which does not require any extraction or matching process. Hence, its performance is not affected by the potential errors introduced by these steps. Furthermore, the predicted occlusion masks are also incorporated in the controller to address the occlusion problem inherited from a stereo camera setup. The performance of the proposed control strategy is verified by extensive simulations and experiments
- …