1 research outputs found
Deep 3D Pan via adaptive "t-shaped" convolutions with global and local adaptive dilations
Recent advances in deep learning have shown promising results in many
low-level vision tasks. However, solving the single-image-based view synthesis
is still an open problem. In particular, the generation of new images at
parallel camera views given a single input image is of great interest, as it
enables 3D visualization of the 2D input scenery. We propose a novel network
architecture to perform stereoscopic view synthesis at arbitrary camera
positions along the X-axis, or Deep 3D Pan, with "t-shaped" adaptive kernels
equipped with globally and locally adaptive dilations. Our proposed network
architecture, the monster-net, is devised with a novel "t-shaped" adaptive
kernel with globally and locally adaptive dilation, which can efficiently
incorporate global camera shift into and handle local 3D geometries of the
target image's pixels for the synthesis of naturally looking 3D panned views
when a 2-D input image is given. Extensive experiments were performed on the
KITTI, CityScapes and our VICLAB_STEREO indoors dataset to prove the efficacy
of our method. Our monster-net significantly outperforms the state-of-the-art
method, SOTA, by a large margin in all metrics of RMSE, PSNR, and SSIM. Our
proposed monster-net is capable of reconstructing more reliable image
structures in synthesized images with coherent geometry. Moreover, the
disparity information that can be extracted from the "t-shaped" kernel is much
more reliable than that of the SOTA for the unsupervised monocular depth
estimation task, confirming the effectiveness of our method.Comment: Check our video at https://www.youtube.com/watch?v=o0b-e282Rt