6,275 research outputs found
Context-aware Synthesis for Video Frame Interpolation
Video frame interpolation algorithms typically estimate optical flow or its
variations and then use it to guide the synthesis of an intermediate frame
between two consecutive original frames. To handle challenges like occlusion,
bidirectional flow between the two input frames is often estimated and used to
warp and blend the input frames. However, how to effectively blend the two
warped frames still remains a challenging problem. This paper presents a
context-aware synthesis approach that warps not only the input frames but also
their pixel-wise contextual information and uses them to interpolate a
high-quality intermediate frame. Specifically, we first use a pre-trained
neural network to extract per-pixel contextual information for input frames. We
then employ a state-of-the-art optical flow algorithm to estimate bidirectional
flow between them and pre-warp both input frames and their context maps.
Finally, unlike common approaches that blend the pre-warped frames, our method
feeds them and their context maps to a video frame synthesis neural network to
produce the interpolated frame in a context-aware fashion. Our neural network
is fully convolutional and is trained end to end. Our experiments show that our
method can handle challenging scenarios such as occlusion and large motion and
outperforms representative state-of-the-art approaches.Comment: CVPR 2018, http://graphics.cs.pdx.edu/project/ctxsy
Neural View-Interpolation for Sparse Light Field Video
We suggest representing light field (LF) videos as "one-off" neural networks (NN), i.e., a learned mapping from view-plus-time coordinates to high-resolution color values, trained on sparse views. Initially, this sounds like a bad idea for three main reasons: First, a NN LF will likely have less quality than a same-sized pixel basis representation. Second, only few training data, e.g., 9 exemplars per frame are available for sparse LF videos. Third, there is no generalization across LFs, but across view and time instead. Consequently, a network needs to be trained for each LF video. Surprisingly, these problems can turn into substantial advantages: Other than the linear pixel basis, a NN has to come up with a compact, non-linear i.e., more intelligent, explanation of color, conditioned on the sparse view and time coordinates. As observed for many NN however, this representation now is interpolatable: if the image output for sparse view coordinates is plausible, it is for all intermediate, continuous coordinates as well. Our specific network architecture involves a differentiable occlusion-aware warping step, which leads to a compact set of trainable parameters and consequently fast learning and fast execution
Occlusion Aware Unsupervised Learning of Optical Flow
It has been recently shown that a convolutional neural network can learn
optical flow estimation with unsupervised learning. However, the performance of
the unsupervised methods still has a relatively large gap compared to its
supervised counterpart. Occlusion and large motion are some of the major
factors that limit the current unsupervised learning of optical flow methods.
In this work we introduce a new method which models occlusion explicitly and a
new warping way that facilitates the learning of large motion. Our method shows
promising results on Flying Chairs, MPI-Sintel and KITTI benchmark datasets.
Especially on KITTI dataset where abundant unlabeled samples exist, our
unsupervised method outperforms its counterpart trained with supervised
learning.Comment: CVPR 2018 Camera-read
Video Frame Interpolation via Adaptive Separable Convolution
Standard video frame interpolation methods first estimate optical flow
between input frames and then synthesize an intermediate frame guided by
motion. Recent approaches merge these two steps into a single convolution
process by convolving input frames with spatially adaptive kernels that account
for motion and re-sampling simultaneously. These methods require large kernels
to handle large motion, which limits the number of pixels whose kernels can be
estimated at once due to the large memory demand. To address this problem, this
paper formulates frame interpolation as local separable convolution over input
frames using pairs of 1D kernels. Compared to regular 2D kernels, the 1D
kernels require significantly fewer parameters to be estimated. Our method
develops a deep fully convolutional neural network that takes two input frames
and estimates pairs of 1D kernels for all pixels simultaneously. Since our
method is able to estimate kernels and synthesizes the whole video frame at
once, it allows for the incorporation of perceptual loss to train the neural
network to produce visually pleasing frames. This deep neural network is
trained end-to-end using widely available video data without any human
annotation. Both qualitative and quantitative experiments show that our method
provides a practical solution to high-quality video frame interpolation.Comment: ICCV 2017, http://graphics.cs.pdx.edu/project/sepconv
- …