6,544 research outputs found

    Machine Learning techniques applied to stereo vision

    Get PDF
    Stereo is a popular technique enabling fast and dense depth estimation from two or more images. Its success is mainly due to its easiness of deployment, requiring only a couple or multiple synchronized image sensors, accurately calibrated to solve the matching problem between pixels on one of the images (named reference) and the other (named target). The absence of active technologies (e.g. pattern projection, laser scanners etc..) make this solution deployable on almost every scenario. Despite the wide literature concerning stereo, it still represents an open problem because of very challenging conditions such as poor illumination, reflective surfaces, occlusions and other elements occurring in real environments. Two main trends in stereo vision acquired popularity in the last years: confidence estimation and machine learning. Both proved to be very effective, pushing forward the state-of-the-art of dense disparity estimation. In this thesis, we combine these two trends to improve both confidence estimation and disparity inference, by defining more effective and easier to deploy confidence measures and proposing new approaches to leverage on them for more accurate depth prediction. All the experiments are validated on three popular datasets, KITTI 2012, KITTI 2015 and Middlebury v3, following the commonly adopted methodologies and protocol to compare our proposals with previous works representing the state-of-the-art in stereo vision

    On the Synergies between Machine Learning and Binocular Stereo for Depth Estimation from Images: a Survey

    Full text link
    Stereo matching is one of the longest-standing problems in computer vision with close to 40 years of studies and research. Throughout the years the paradigm has shifted from local, pixel-level decision to various forms of discrete and continuous optimization to data-driven, learning-based methods. Recently, the rise of machine learning and the rapid proliferation of deep learning enhanced stereo matching with new exciting trends and applications unthinkable until a few years ago. Interestingly, the relationship between these two worlds is two-way. While machine, and especially deep, learning advanced the state-of-the-art in stereo matching, stereo itself enabled new ground-breaking methodologies such as self-supervised monocular depth estimation based on deep networks. In this paper, we review recent research in the field of learning-based depth estimation from single and binocular images highlighting the synergies, the successes achieved so far and the open challenges the community is going to face in the immediate future.Comment: Accepted to TPAMI. Paper version of our CVPR 2019 tutorial: "Learning-based depth estimation from stereo and monocular images: successes, limitations and future challenges" (https://sites.google.com/view/cvpr-2019-depth-from-image/home

    Non-iterative RGB-D-inertial Odometry

    Full text link
    This paper presents a non-iterative solution to RGB-D-inertial odometry system. Traditional odometry methods resort to iterative algorithms which are usually computationally expensive or require well-designed initialization. To overcome this problem, this paper proposes to combine a non-iterative front-end (odometry) with an iterative back-end (loop closure) for the RGB-D-inertial SLAM system. The main contribution lies in the novel non-iterative front-end, which leverages on inertial fusion and kernel cross-correlators (KCC) to match point clouds in frequency domain. Dominated by the fast Fourier transform (FFT), our method is only of complexity O(nlogn)\mathcal{O}(n\log{n}), where nn is the number of points. Map fusion is conducted by element-wise operations, so that both time and space complexity are further reduced. Extensive experiments show that, due to the lightweight of the proposed front-end, the framework is able to run at a much faster speed yet still with comparable accuracy with the state-of-the-arts

    FusionRCNN: LiDAR-Camera Fusion for Two-stage 3D Object Detection

    Full text link
    3D object detection with multi-sensors is essential for an accurate and reliable perception system of autonomous driving and robotics. Existing 3D detectors significantly improve the accuracy by adopting a two-stage paradigm which merely relies on LiDAR point clouds for 3D proposal refinement. Though impressive, the sparsity of point clouds, especially for the points far away, making it difficult for the LiDAR-only refinement module to accurately recognize and locate objects.To address this problem, we propose a novel multi-modality two-stage approach named FusionRCNN, which effectively and efficiently fuses point clouds and camera images in the Regions of Interest(RoI). FusionRCNN adaptively integrates both sparse geometry information from LiDAR and dense texture information from camera in a unified attention mechanism. Specifically, it first utilizes RoIPooling to obtain an image set with a unified size and gets the point set by sampling raw points within proposals in the RoI extraction step; then leverages an intra-modality self-attention to enhance the domain-specific features, following by a well-designed cross-attention to fuse the information from two modalities.FusionRCNN is fundamentally plug-and-play and supports different one-stage methods with almost no architectural changes. Extensive experiments on KITTI and Waymo benchmarks demonstrate that our method significantly boosts the performances of popular detectors.Remarkably, FusionRCNN significantly improves the strong SECOND baseline by 6.14% mAP on Waymo, and outperforms competing two-stage approaches. Code will be released soon at https://github.com/xxlbigbrother/Fusion-RCNN.Comment: 7 pages, 3 figure
    corecore