1,827 research outputs found
A Unified Pyramid Recurrent Network for Video Frame Interpolation
Flow-guide synthesis provides a common framework for frame interpolation,
where optical flow is typically estimated by a pyramid network, and then
leveraged to guide a synthesis network to generate intermediate frames between
input frames. In this paper, we present UPR-Net, a novel Unified Pyramid
Recurrent Network for frame interpolation. Cast in a flexible pyramid
framework, UPR-Net exploits lightweight recurrent modules for both
bi-directional flow estimation and intermediate frame synthesis. At each
pyramid level, it leverages estimated bi-directional flow to generate
forward-warped representations for frame synthesis; across pyramid levels, it
enables iterative refinement for both optical flow and intermediate frame. In
particular, we show that our iterative synthesis can significantly improve the
robustness of frame interpolation on large motion cases. Despite being
extremely lightweight (1.7M parameters), UPR-Net achieves excellent performance
on a large range of benchmarks. Code will be available soon.Comment: arXiv admin note: text overlap with arXiv:2206.08572 by other author
Performance of Wavelet-based Multiresolution Motion Estimation for Inbetweeningin Old Animated Films
This paper investigates the performance of wavelet-based multiresolution motion estimation (MRME) for inbetweening in old animated films using three different MRME schemes. The three schemes are: coarse-to fine with a wavelet-based MRME, one of Zhang's MRMEs, and an MRME in the spatial domain. In order to make a performance comparison of these MRME schemes, two video sequences were used for a simulation. The experimental results show that the coarse-to-fine method performed better than Zhang's MRME and the MRME in the spatial domain. The evaluation results on block size 9x9 indicate that the coarse-to-fine method had an average peak signal-to-noise ratio (PSNR) of 23.48 dB for the first sequence and 29.84 for the second sequence
The curvelet transform for image denoising
We describe approximate digital implementations of two new mathematical transforms, namely, the ridgelet transform and the curvelet transform. Our implementations offer exact reconstruction, stability against perturbations, ease of implementation, and low computational complexity. A central tool is Fourier-domain computation of an approximate digital Radon transform. We introduce a very simple interpolation in the Fourier space which takes Cartesian samples and yields samples on a rectopolar grid, which is a pseudo-polar sampling set based on a concentric squares geometry. Despite the crudeness of our interpolation, the visual performance is surprisingly good. Our ridgelet transform applies to the Radon transform a special overcomplete wavelet pyramid whose wavelets have compact support in the frequency domain. Our curvelet transform uses our ridgelet transform as a component step, and implements curvelet subbands using a filter bank of a` trous wavelet filters. Our philosophy throughout is that transforms should be overcomplete, rather than critically sampled. We apply these digital transforms to the denoising of some standard images embedded in white noise. In the tests reported here, simple thresholding of the curvelet coefficients is very competitive with "state of the art" techniques based on wavelets, including thresholding of decimated or undecimated wavelet transforms and also including tree-based Bayesian posterior mean methods. Moreover, the curvelet reconstructions exhibit higher perceptual quality than wavelet-based reconstructions, offering visually sharper images and, in particular, higher quality recovery of edges and of faint linear and curvilinear features. Existing theory for curvelet and ridgelet transforms suggests that these new approaches can outperform wavelet methods in certain image reconstruction problems. The empirical results reported here are in encouraging agreement
H-VFI: Hierarchical Frame Interpolation for Videos with Large Motions
Capitalizing on the rapid development of neural networks, recent video frame
interpolation (VFI) methods have achieved notable improvements. However, they
still fall short for real-world videos containing large motions. Complex
deformation and/or occlusion caused by large motions make it an extremely
difficult problem in video frame interpolation. In this paper, we propose a
simple yet effective solution, H-VFI, to deal with large motions in video frame
interpolation. H-VFI contributes a hierarchical video interpolation transformer
(HVIT) to learn a deformable kernel in a coarse-to-fine strategy in multiple
scales. The learnt deformable kernel is then utilized in convolving the input
frames for predicting the interpolated frame. Starting from the smallest scale,
H-VFI updates the deformable kernel by a residual in succession based on former
predicted kernels, intermediate interpolated results and hierarchical features
from transformer. Bias and masks to refine the final outputs are then predicted
by a transformer block based on interpolated results. The advantage of such a
progressive approximation is that the large motion frame interpolation problem
can be decomposed into several relatively simpler sub-tasks, which enables a
very accurate prediction in the final results. Another noteworthy contribution
of our paper consists of a large-scale high-quality dataset, YouTube200K, which
contains videos depicting a great variety of scenarios captured at high
resolution and high frame rate. Extensive experiments on multiple frame
interpolation benchmarks validate that H-VFI outperforms existing
state-of-the-art methods especially for videos with large motions
RAI-Net: Range-Adaptive LiDAR Point Cloud Frame Interpolation Network
LiDAR point cloud frame interpolation, which synthesizes the intermediate
frame between the captured frames, has emerged as an important issue for many
applications. Especially for reducing the amounts of point cloud transmission,
it is by predicting the intermediate frame based on the reference frames to
upsample data to high frame rate ones. However, due to high-dimensional and
sparse characteristics of point clouds, it is more difficult to predict the
intermediate frame for LiDAR point clouds than videos. In this paper, we
propose a novel LiDAR point cloud frame interpolation method, which exploits
range images (RIs) as an intermediate representation with CNNs to conduct the
frame interpolation process. Considering the inherited characteristics of RIs
differ from that of color images, we introduce spatially adaptive convolutions
to extract range features adaptively, while a high-efficient flow estimation
method is presented to generate optical flows. The proposed model then warps
the input frames and range features, based on the optical flows to synthesize
the interpolated frame. Extensive experiments on the KITTI dataset have clearly
demonstrated that our method consistently achieves superior frame interpolation
results with better perceptual quality to that of using state-of-the-art video
frame interpolation methods. The proposed method could be integrated into any
LiDAR point cloud compression systems for inter prediction.Comment: Accepted by the IEEE International Symposium on Broadband Multimedia
Systems and Broadcasting 202
- …