9 research outputs found

    Guided Stereo Matching

    Full text link
    Stereo is a prominent technique to infer dense depth maps from images, and deep learning further pushed forward the state-of-the-art, making end-to-end architectures unrivaled when enough data is available for training. However, deep networks suffer from significant drops in accuracy when dealing with new environments. Therefore, in this paper, we introduce Guided Stereo Matching, a novel paradigm leveraging a small amount of sparse, yet reliable depth measurements retrieved from an external source enabling to ameliorate this weakness. The additional sparse cues required by our method can be obtained with any strategy (e.g., a LiDAR) and used to enhance features linked to corresponding disparity hypotheses. Our formulation is general and fully differentiable, thus enabling to exploit the additional sparse inputs in pre-trained deep stereo networks as well as for training a new instance from scratch. Extensive experiments on three standard datasets and two state-of-the-art deep architectures show that even with a small set of sparse input cues, i) the proposed paradigm enables significant improvements to pre-trained networks. Moreover, ii) training from scratch notably increases accuracy and robustness to domain shifts. Finally, iii) it is suited and effective even with traditional stereo algorithms such as SGM.Comment: CVPR 201

    Real-time self-adaptive deep stereo

    Full text link
    Deep convolutional neural networks trained end-to-end are the state-of-the-art methods to regress dense disparity maps from stereo pairs. These models, however, suffer from a notable decrease in accuracy when exposed to scenarios significantly different from the training set, e.g., real vs synthetic images, etc.). We argue that it is extremely unlikely to gather enough samples to achieve effective training/tuning in any target domain, thus making this setup impractical for many applications. Instead, we propose to perform unsupervised and continuous online adaptation of a deep stereo network, which allows for preserving its accuracy in any environment. However, this strategy is extremely computationally demanding and thus prevents real-time inference. We address this issue introducing a new lightweight, yet effective, deep stereo architecture, Modularly ADaptive Network (MADNet) and developing a Modular ADaptation (MAD) algorithm, which independently trains sub-portions of the network. By deploying MADNet together with MAD we introduce the first real-time self-adaptive deep stereo system enabling competitive performance on heterogeneous datasets.Comment: Accepted at CVPR2019 as oral presentation. Code Available https://github.com/CVLAB-Unibo/Real-time-self-adaptive-deep-stere

    Multi-label learning based semi-global matching forest

    Get PDF
    Semi-Global Matching (SGM) approximates a 2D Markov Random Field (MRF) via multiple 1D scanline optimizations, which serves as a good trade-off between accuracy and efficiency in dense matching. Nevertheless, the performance is limited due to the simple summation of the aggregated costs from all 1D scanline optimizations for the final disparity estimation. SGM-Forest improves the performance of SGM by training a random forest to predict the best scanline according to each scanline’s disparity proposal. The disparity estimated by the best scanline acts as reference to adaptively adopt close proposals for further post-processing. However, in many cases more than one scanline is capable of providing a good prediction. Training the random forest with only one scanline labeled may limit or even confuse the learning procedure when other scanlines can offer similar contributions. In this paper, we propose a multi-label classification strategy to further improve SGM-Forest. Each training sample is allowed to be described by multiple labels (or zero label) if more than one (or none) scanline gives a proper prediction. We test the proposed method on stereo matching datasets, from Middlebury, ETH3D, EuroSDR image matching benchmark, and the 2019 IEEE GRSS data fusion contest. The result indicates that under the framework of SGM-Forest, the multi-label strategy outperforms the single-label scheme consistently

    Learning a general-purpose confidence measure based on O(1) features and a smarter aggregation strategy for semi global matching

    No full text
    Inferring dense depth from stereo is crucial for several computer vision applications and Semi Global Matching (SGM) is often the preferred choice due to its good tradeoff between accuracy and computation requirements. Nevertheless, it suffers of two major issues: Streaking artifacts caused by the Scanline Optimization (SO) approach, at the core of this algorithm, may lead to inaccurate results and the high memory footprint that may become prohibitive with high resolution images or devices with constrained resources. In this paper, we propose a smart scanline aggregation approach for SGM aimed at dealing with both issues. In particular, the contribution of this paper is threefold: I) leveraging on machine learning, proposes a novel generalpurpose confidence measure suited for any for stereo algorithm, based on O(1) features, that outperforms state of-the-art ii) taking advantage of this confidence measure proposes a smart aggregation strategy for SGM enabling significant improvements with a very small overhead iii) the overall strategy drastically reduces the memory footprint of SGM and, at the same time, improves its effectiveness and execution time. We provide extensive experimental results, including a cross-validation with multiple datasets (KITTI 2012, KITTI 2015 and Middlebury 2014)

    REDUCTION OF THE FRONTO-PARALLEL BIAS FOR WIDE-BASELINE SEMI-GLOBAL MATCHING

    Get PDF
    Semi-Global Matching (SGM) is a widely-used technique for dense image matching that is popular because of its accuracy and speed. While it works well for textured scenes, it can fail on slanted surfaces particularly in wide-baseline configurations due to the so-called fronto-parallel bias. In this paper, we propose an extension of SGM that utilizes image warping to reduce the fronto-parallel bias in the data term, based on estimating dominant slanted planes. The latter are also used as surface priors improving the smoothness term. Our proposed method calculates disparity maps for each dominant slanted plane and fuses them to obtain the final disparity map. We have quantitatively evaluated our approach outperforming SGM and SGM-P on synthetic data and demonstrate its potential on real data by qualitative results. In this way, we underscore the need to tackle the fronto-parallel bias in particular for wide-baseline configurations in both the data term and the smoothness term of SGM

    Machine Learning techniques applied to stereo vision

    Get PDF
    Stereo is a popular technique enabling fast and dense depth estimation from two or more images. Its success is mainly due to its easiness of deployment, requiring only a couple or multiple synchronized image sensors, accurately calibrated to solve the matching problem between pixels on one of the images (named reference) and the other (named target). The absence of active technologies (e.g. pattern projection, laser scanners etc..) make this solution deployable on almost every scenario. Despite the wide literature concerning stereo, it still represents an open problem because of very challenging conditions such as poor illumination, reflective surfaces, occlusions and other elements occurring in real environments. Two main trends in stereo vision acquired popularity in the last years: confidence estimation and machine learning. Both proved to be very effective, pushing forward the state-of-the-art of dense disparity estimation. In this thesis, we combine these two trends to improve both confidence estimation and disparity inference, by defining more effective and easier to deploy confidence measures and proposing new approaches to leverage on them for more accurate depth prediction. All the experiments are validated on three popular datasets, KITTI 2012, KITTI 2015 and Middlebury v3, following the commonly adopted methodologies and protocol to compare our proposals with previous works representing the state-of-the-art in stereo vision
    corecore