15 research outputs found
Fast View Synthesis with Deep Stereo Vision
Novel view synthesis is an important problem in computer vision and graphics.
Over the years a large number of solutions have been put forward to solve the
problem. However, the large-baseline novel view synthesis problem is far from
being "solved". Recent works have attempted to use Convolutional Neural
Networks (CNNs) to solve view synthesis tasks. Due to the difficulty of
learning scene geometry and interpreting camera motion, CNNs are often unable
to generate realistic novel views. In this paper, we present a novel view
synthesis approach based on stereo-vision and CNNs that decomposes the problem
into two sub-tasks: view dependent geometry estimation and texture inpainting.
Both tasks are structured prediction problems that could be effectively learned
with CNNs. Experiments on the KITTI Odometry dataset show that our approach is
more accurate and significantly faster than the current state-of-the-art. The
code and supplementary material will be publicly available. Results could be
found here https://youtu.be/5pzS9jc-5t
Softmax Splatting for Video Frame Interpolation
Differentiable image sampling in the form of backward warping has seen broad
adoption in tasks like depth estimation and optical flow prediction. In
contrast, how to perform forward warping has seen less attention, partly due to
additional challenges such as resolving the conflict of mapping multiple pixels
to the same target location in a differentiable way. We propose softmax
splatting to address this paradigm shift and show its effectiveness on the
application of frame interpolation. Specifically, given two input frames, we
forward-warp the frames and their feature pyramid representations based on an
optical flow estimate using softmax splatting. In doing so, the softmax
splatting seamlessly handles cases where multiple source pixels map to the same
target location. We then use a synthesis network to predict the interpolation
result from the warped representations. Our softmax splatting allows us to not
only interpolate frames at an arbitrary time but also to fine tune the feature
pyramid and the optical flow. We show that our synthesis approach, empowered by
softmax splatting, achieves new state-of-the-art results for video frame
interpolation.Comment: CVPR 2020, http://sniklaus.com/softspla
Perspective Plane Program Induction from a Single Image
We study the inverse graphics problem of inferring a holistic representation
for natural images. Given an input image, our goal is to induce a
neuro-symbolic, program-like representation that jointly models camera poses,
object locations, and global scene structures. Such high-level, holistic scene
representations further facilitate low-level image manipulation tasks such as
inpainting. We formulate this problem as jointly finding the camera pose and
scene structure that best describe the input image. The benefits of such joint
inference are two-fold: scene regularity serves as a new cue for perspective
correction, and in turn, correct perspective correction leads to a simplified
scene structure, similar to how the correct shape leads to the most regular
texture in shape from texture. Our proposed framework, Perspective Plane
Program Induction (P3I), combines search-based and gradient-based algorithms to
efficiently solve the problem. P3I outperforms a set of baselines on a
collection of Internet images, across tasks including camera pose estimation,
global structure inference, and down-stream image manipulation tasks.Comment: CVPR 2020. First two authors contributed equally. Project page:
http://p3i.csail.mit.edu