745 research outputs found
Burstormer: Burst Image Restoration and Enhancement Transformer
On a shutter press, modern handheld cameras capture multiple images in rapid
succession and merge them to generate a single image. However, individual
frames in a burst are misaligned due to inevitable motions and contain multiple
degradations. The challenge is to properly align the successive image shots and
merge their complimentary information to achieve high-quality outputs. Towards
this direction, we propose Burstormer: a novel transformer-based architecture
for burst image restoration and enhancement. In comparison to existing works,
our approach exploits multi-scale local and non-local features to achieve
improved alignment and feature fusion. Our key idea is to enable inter-frame
communication in the burst neighborhoods for information aggregation and
progressive fusion while modeling the burst-wide context. However, the input
burst frames need to be properly aligned before fusing their information.
Therefore, we propose an enhanced deformable alignment module for aligning
burst features with regards to the reference frame. Unlike existing methods,
the proposed alignment module not only aligns burst features but also exchanges
feature information and maintains focused communication with the reference
frame through the proposed reference-based feature enrichment mechanism, which
facilitates handling complex motions. After multi-level alignment and
enrichment, we re-emphasize on inter-frame communication within burst using a
cyclic burst sampling module. Finally, the inter-frame information is
aggregated using the proposed burst feature fusion module followed by
progressive upsampling. Our Burstormer outperforms state-of-the-art methods on
burst super-resolution, burst denoising and burst low-light enhancement. Our
codes and pretrained models are available at https://
github.com/akshaydudhane16/BurstormerComment: Accepted at CVPR 202
Deformable Kernel Networks for Joint Image Filtering
Joint image filters are used to transfer structural details from a guidance
picture used as a prior to a target image, in tasks such as enhancing spatial
resolution and suppressing noise. Previous methods based on convolutional
neural networks (CNNs) combine nonlinear activations of spatially-invariant
kernels to estimate structural details and regress the filtering result. In
this paper, we instead learn explicitly sparse and spatially-variant kernels.
We propose a CNN architecture and its efficient implementation, called the
deformable kernel network (DKN), that outputs sets of neighbors and the
corresponding weights adaptively for each pixel. The filtering result is then
computed as a weighted average. We also propose a fast version of DKN that runs
about seventeen times faster for an image of size 640 x 480. We demonstrate the
effectiveness and flexibility of our models on the tasks of depth map
upsampling, saliency map upsampling, cross-modality image restoration, texture
removal, and semantic segmentation. In particular, we show that the weighted
averaging process with sparsely sampled 3 x 3 kernels outperforms the state of
the art by a significant margin in all cases.Comment: arXiv admin note: substantial text overlap with arXiv:1903.11286
(IJCV accepted
- …