We present TemporalStereo, a coarse-to-fine based online stereo matching
network which is highly efficient, and able to effectively exploit the past
geometry and context information to boost the matching accuracy. Our network
leverages sparse cost volume and proves to be effective when a single stereo
pair is given, however, its peculiar ability to use spatio-temporal information
across frames allows TemporalStereo to alleviate problems such as occlusions
and reflective regions while enjoying high efficiency also in the case of
stereo sequences. Notably our model trained, once with stereo videos, can run
in both single-pair and temporal ways seamlessly. Experiments show that our
network relying on camera motion is even robust to dynamic objects when running
on videos. We validate TemporalStereo through extensive experiments on
synthetic (SceneFlow, TartanAir) and real (KITTI 2012, KITTI 2015) datasets.
Detailed results show that our model achieves state-of-the-art performance on
any of these datasets. Code is available at
\url{https://github.com/youmi-zym/TemporalStereo.git}