3,422 research outputs found
Online Video Deblurring via Dynamic Temporal Blending Network
State-of-the-art video deblurring methods are capable of removing non-uniform
blur caused by unwanted camera shake and/or object motion in dynamic scenes.
However, most existing methods are based on batch processing and thus need
access to all recorded frames, rendering them computationally demanding and
time consuming and thus limiting their practical use. In contrast, we propose
an online (sequential) video deblurring method based on a spatio-temporal
recurrent network that allows for real-time performance. In particular, we
introduce a novel architecture which extends the receptive field while keeping
the overall size of the network small to enable fast execution. In doing so,
our network is able to remove even large blur caused by strong camera shake
and/or fast moving objects. Furthermore, we propose a novel network layer that
enforces temporal consistency between consecutive frames by dynamic temporal
blending which compares and adaptively (at test time) shares features obtained
at different time steps. We show the superiority of the proposed method in an
extensive experimental evaluation.Comment: 10 page
Structured Sparsity Learning for Efficient Video Super-Resolution
The high computational costs of video super-resolution (VSR) models hinder
their deployment on resource-limited devices, (e.g., smartphones and drones).
Existing VSR models contain considerable redundant filters, which drag down the
inference efficiency. To prune these unimportant filters, we develop a
structured pruning scheme called Structured Sparsity Learning (SSL) according
to the properties of VSR. In SSL, we design pruning schemes for several key
components in VSR models, including residual blocks, recurrent networks, and
upsampling networks. Specifically, we develop a Residual Sparsity Connection
(RSC) scheme for residual blocks of recurrent networks to liberate pruning
restrictions and preserve the restoration information. For upsampling networks,
we design a pixel-shuffle pruning scheme to guarantee the accuracy of feature
channel-space conversion. In addition, we observe that pruning error would be
amplified as the hidden states propagate along with recurrent networks. To
alleviate the issue, we design Temporal Finetuning (TF). Extensive experiments
show that SSL can significantly outperform recent methods quantitatively and
qualitatively. We will release codes and models
Diagnosing and Preventing Instabilities in Recurrent Video Processing.
Recurrent models are a popular choice for video enhancement tasks such as video denoising or super-resolution. In this work, we focus on their stability as dynamical systems and show that they tend to fail catastrophically at inference time on long video sequences. To address this issue, we (1) introduce a diagnostic tool which produces input sequences optimized to trigger instabilities and that can be interpreted as visualizations of temporal receptive fields, and (2) propose two approaches to enforce the stability of a model during training: constraining the spectral norm or constraining the stable rank of its convolutional layers. We then introduce Stable Rank Normalization for Convolutional layers (SRN-C), a new algorithm that enforces these constraints. Our experimental results suggest that SRN-C successfully enforces stablility in recurrent video processing models without a significant performance loss
An Efficient Recurrent Adversarial Framework for Unsupervised Real-Time Video Enhancement
Video enhancement is a challenging problem, more than that of stills, mainly
due to high computational cost, larger data volumes and the difficulty of
achieving consistency in the spatio-temporal domain. In practice, these
challenges are often coupled with the lack of example pairs, which inhibits the
application of supervised learning strategies. To address these challenges, we
propose an efficient adversarial video enhancement framework that learns
directly from unpaired video examples. In particular, our framework introduces
new recurrent cells that consist of interleaved local and global modules for
implicit integration of spatial and temporal information. The proposed design
allows our recurrent cells to efficiently propagate spatio-temporal information
across frames and reduces the need for high complexity networks. Our setting
enables learning from unpaired videos in a cyclic adversarial manner, where the
proposed recurrent units are employed in all architectures. Efficient training
is accomplished by introducing one single discriminator that learns the joint
distribution of source and target domain simultaneously. The enhancement
results demonstrate clear superiority of the proposed video enhancer over the
state-of-the-art methods, in all terms of visual quality, quantitative metrics,
and inference speed. Notably, our video enhancer is capable of enhancing over
35 frames per second of FullHD video (1080x1920)
Scale-Adaptive Feature Aggregation for Efficient Space-Time Video Super-Resolution
The Space-Time Video Super-Resolution (STVSR) task aims to enhance the visual
quality of videos, by simultaneously performing video frame interpolation (VFI)
and video super-resolution (VSR). However, facing the challenge of the
additional temporal dimension and scale inconsistency, most existing STVSR
methods are complex and inflexible in dynamically modeling different motion
amplitudes. In this work, we find that choosing an appropriate processing scale
achieves remarkable benefits in flow-based feature propagation. We propose a
novel Scale-Adaptive Feature Aggregation (SAFA) network that adaptively selects
sub-networks with different processing scales for individual samples.
Experiments on four public STVSR benchmarks demonstrate that SAFA achieves
state-of-the-art performance. Our SAFA network outperforms recent
state-of-the-art methods such as TMNet and VideoINR by an average improvement
of over 0.5dB on PSNR, while requiring less than half the number of parameters
and only 1/3 computational costs.Comment: WACV2024, 16 page
- …