518 research outputs found
Video Frame Interpolation via Adaptive Separable Convolution
Standard video frame interpolation methods first estimate optical flow
between input frames and then synthesize an intermediate frame guided by
motion. Recent approaches merge these two steps into a single convolution
process by convolving input frames with spatially adaptive kernels that account
for motion and re-sampling simultaneously. These methods require large kernels
to handle large motion, which limits the number of pixels whose kernels can be
estimated at once due to the large memory demand. To address this problem, this
paper formulates frame interpolation as local separable convolution over input
frames using pairs of 1D kernels. Compared to regular 2D kernels, the 1D
kernels require significantly fewer parameters to be estimated. Our method
develops a deep fully convolutional neural network that takes two input frames
and estimates pairs of 1D kernels for all pixels simultaneously. Since our
method is able to estimate kernels and synthesizes the whole video frame at
once, it allows for the incorporation of perceptual loss to train the neural
network to produce visually pleasing frames. This deep neural network is
trained end-to-end using widely available video data without any human
annotation. Both qualitative and quantitative experiments show that our method
provides a practical solution to high-quality video frame interpolation.Comment: ICCV 2017, http://graphics.cs.pdx.edu/project/sepconv
Context-aware Synthesis for Video Frame Interpolation
Video frame interpolation algorithms typically estimate optical flow or its
variations and then use it to guide the synthesis of an intermediate frame
between two consecutive original frames. To handle challenges like occlusion,
bidirectional flow between the two input frames is often estimated and used to
warp and blend the input frames. However, how to effectively blend the two
warped frames still remains a challenging problem. This paper presents a
context-aware synthesis approach that warps not only the input frames but also
their pixel-wise contextual information and uses them to interpolate a
high-quality intermediate frame. Specifically, we first use a pre-trained
neural network to extract per-pixel contextual information for input frames. We
then employ a state-of-the-art optical flow algorithm to estimate bidirectional
flow between them and pre-warp both input frames and their context maps.
Finally, unlike common approaches that blend the pre-warped frames, our method
feeds them and their context maps to a video frame synthesis neural network to
produce the interpolated frame in a context-aware fashion. Our neural network
is fully convolutional and is trained end to end. Our experiments show that our
method can handle challenging scenarios such as occlusion and large motion and
outperforms representative state-of-the-art approaches.Comment: CVPR 2018, http://graphics.cs.pdx.edu/project/ctxsy
- …