60 research outputs found
Better Stereo Matching From Simple Yet Effective Wrangling of Deep Features
Cost volume plays a pivotal role in stereo matching. Most recent works focused on deep feature extraction and cost refinement for a more accurate cost volume. Unlike them, we probe from a different perspective: feature wrangling. We find that simple wrangling of deep features can effectively improve the construction of cost volume and thus the performance of stereo matching. Specifically, we develop two simple yet effective wrangling techniques of deep features, spatially a differentiable feature transformation and channel-wise a memory-economical feature expansion, for better cost construction. Exploiting the local ordering information provided by a differentiable rank transform, we achieve an enhancement of the search for correspondence; with the help of disparity division, our feature expansion allows for more features into the cost volume with no extra memory required. Equipped with these two feature wrangling techniques, our simple network can perform outstandingly on the widely used KITTI and Sceneflow datasets
Disparity Estimation with Scene Depth Cues
The cost volume plays a pivotal role in stereo matching, usually working as an optimization object. However, we find it also can provide effective scene prior to guide the disparity learning, as it reflects well the depth relationship between scenario objects. Inspired by this new perspective, we propose the CSA module, which consists of a new correlation and selection (CS) layer and a new aggregation layer. The CS layer can regulate the matching costs and re-encode the feature information into the correlation volume. The aggregation layer can preserve better the depth cues of the refined cost volume, through a convolution network and a unimodalization operation. The proposed module can be trained in a supervised manner, making the extraction of scene depth cues more accurate. Extensive experiments on the Sceneflow and KITTI datasets have demonstrated that with our module embedded, SOTA networks can achieve substantially better performance
Real-time self-adaptive deep stereo
Deep convolutional neural networks trained end-to-end are the
state-of-the-art methods to regress dense disparity maps from stereo pairs.
These models, however, suffer from a notable decrease in accuracy when exposed
to scenarios significantly different from the training set, e.g., real vs
synthetic images, etc.). We argue that it is extremely unlikely to gather
enough samples to achieve effective training/tuning in any target domain, thus
making this setup impractical for many applications. Instead, we propose to
perform unsupervised and continuous online adaptation of a deep stereo network,
which allows for preserving its accuracy in any environment. However, this
strategy is extremely computationally demanding and thus prevents real-time
inference. We address this issue introducing a new lightweight, yet effective,
deep stereo architecture, Modularly ADaptive Network (MADNet) and developing a
Modular ADaptation (MAD) algorithm, which independently trains sub-portions of
the network. By deploying MADNet together with MAD we introduce the first
real-time self-adaptive deep stereo system enabling competitive performance on
heterogeneous datasets.Comment: Accepted at CVPR2019 as oral presentation. Code Available
https://github.com/CVLAB-Unibo/Real-time-self-adaptive-deep-stere
MC-Stereo: Multi-peak Lookup and Cascade Search Range for Stereo Matching
Stereo matching is a fundamental task in scene comprehension. In recent
years, the method based on iterative optimization has shown promise in stereo
matching. However, the current iteration framework employs a single-peak
lookup, which struggles to handle the multi-peak problem effectively.
Additionally, the fixed search range used during the iteration process limits
the final convergence effects. To address these issues, we present a novel
iterative optimization architecture called MC-Stereo. This architecture
mitigates the multi-peak distribution problem in matching through the
multi-peak lookup strategy, and integrates the coarse-to-fine concept into the
iterative framework via the cascade search range. Furthermore, given that
feature representation learning is crucial for successful learn-based stereo
matching, we introduce a pre-trained network to serve as the feature extractor,
enhancing the front end of the stereo matching pipeline. Based on these
improvements, MC-Stereo ranks first among all publicly available methods on the
KITTI-2012 and KITTI-2015 benchmarks, and also achieves state-of-the-art
performance on ETH3D. Code is available at
https://github.com/MiaoJieF/MC-Stereo.Comment: Accepted to 3DV 202
- …