19 research outputs found
Online Adaptive Disparity Estimation for Dynamic Scenes in Structured Light Systems
In recent years, deep neural networks have shown remarkable progress in dense
disparity estimation from dynamic scenes in monocular structured light systems.
However, their performance significantly drops when applied in unseen
environments. To address this issue, self-supervised online adaptation has been
proposed as a solution to bridge this performance gap. Unlike traditional
fine-tuning processes, online adaptation performs test-time optimization to
adapt networks to new domains. Therefore, achieving fast convergence during the
adaptation process is critical for attaining satisfactory accuracy. In this
paper, we propose an unsupervised loss function based on long sequential
inputs. It ensures better gradient directions and faster convergence. Our loss
function is designed using a multi-frame pattern flow, which comprises a set of
sparse trajectories of the projected pattern along the sequence. We estimate
the sparse pseudo ground truth with a confidence mask using a filter-based
method, which guides the online adaptation process. Our proposed framework
significantly improves the online adaptation speed and achieves superior
performance on unseen data.Comment: Accpeted by 36th IEEE/RSJ International Conference on Intelligent
Robots and Systems, 202
Continual Adaptation for Deep Stereo
Depth estimation from stereo images is carried out with unmatched results by convolutional neural networks trained end-to-end to regress dense disparities. Like for most tasks, this is possible if large amounts of labelled samples are available for training, possibly covering the whole data distribution encountered at deployment time. Being such an assumption systematically unmet in real applications, the capacity of adapting to any unseen setting becomes of paramount importance. Purposely, we propose a continual adaptation paradigm for deep stereo networks designed to deal with challenging and ever-changing environments. We design a lightweight and modular architecture, Modularly ADaptive Network (MADNet), and formulate Modular ADaptation algorithms (MAD, MAD++) which permit efficient optimization of independent sub-portions of the entire network. In our paradigm, the learning signals needed to continuously adapt models online can be sourced from self-supervision via right-to-left image warping or from traditional stereo algorithms. With both sources, no other data than the input images being gathered at deployment time are needed. Thus, our network architecture and adaptation algorithms realize the first real-time self-adaptive deep stereo system and pave the way for a new paradigm that can facilitate practical deployment of end-to-end architectures for dense disparity regression
StereoFlowGAN: Co-training for Stereo and Flow with Unsupervised Domain Adaptation
We introduce a novel training strategy for stereo matching and optical flow
estimation that utilizes image-to-image translation between synthetic and real
image domains. Our approach enables the training of models that excel in real
image scenarios while relying solely on ground-truth information from synthetic
images. To facilitate task-agnostic domain adaptation and the training of
task-specific components, we introduce a bidirectional feature warping module
that handles both left-right and forward-backward directions. Experimental
results show competitive performance over previous domain translation-based
methods, which substantiate the efficacy of our proposed framework, effectively
leveraging the benefits of unsupervised domain adaptation, stereo matching, and
optical flow estimation.Comment: Accepted by BMVC 202
Learning Stereo from Single Images
Supervised deep networks are among the best methods for finding
correspondences in stereo image pairs. Like all supervised approaches, these
networks require ground truth data during training. However, collecting large
quantities of accurate dense correspondence data is very challenging. We
propose that it is unnecessary to have such a high reliance on ground truth
depths or even corresponding stereo pairs. Inspired by recent progress in
monocular depth estimation, we generate plausible disparity maps from single
images. In turn, we use those flawed disparity maps in a carefully designed
pipeline to generate stereo training pairs. Training in this manner makes it
possible to convert any collection of single RGB images into stereo training
data. This results in a significant reduction in human effort, with no need to
collect real depths or to hand-design synthetic data. We can consequently train
a stereo matching network from scratch on datasets like COCO, which were
previously hard to exploit for stereo. Through extensive experiments we show
that our approach outperforms stereo networks trained with standard synthetic
datasets, when evaluated on KITTI, ETH3D, and Middlebury.Comment: Accepted as an oral presentation at ECCV 202