181 research outputs found
Decomposition Ascribed Synergistic Learning for Unified Image Restoration
Learning to restore multiple image degradations within a single model is
quite beneficial for real-world applications. Nevertheless, existing works
typically concentrate on regarding each degradation independently, while their
relationship has been less exploited to ensure the synergistic learning. To
this end, we revisit the diverse degradations through the lens of singular
value decomposition, with the observation that the decomposed singular vectors
and singular values naturally undertake the different types of degradation
information, dividing various restoration tasks into two groups,\ie, singular
vector dominated and singular value dominated. The above analysis renders a
more unified perspective to ascribe the diverse degradations, compared to
previous task-level independent learning. The dedicated optimization of
degraded singular vectors and singular values inherently utilizes the potential
relationship among diverse restoration tasks, attributing to the Decomposition
Ascribed Synergistic Learning (DASL). Specifically, DASL comprises two
effective operators, namely, Singular VEctor Operator (SVEO) and Singular VAlue
Operator (SVAO), to favor the decomposed optimization, which can be lightly
integrated into existing convolutional image restoration backbone. Moreover,
the congruous decomposition loss has been devised for auxiliary. Extensive
experiments on blended five image restoration tasks demonstrate the
effectiveness of our method, including image deraining, image dehazing, image
denoising, image deblurring, and low-light image enhancement.Comment: 13 page
Unlocking Masked Autoencoders as Loss Function for Image and Video Restoration
Image and video restoration has achieved a remarkable leap with the advent of
deep learning. The success of deep learning paradigm lies in three key
components: data, model, and loss. Currently, many efforts have been devoted to
the first two while seldom study focuses on loss function. With the question
``are the de facto optimization functions e.g., , , and perceptual
losses optimal?'', we explore the potential of loss and raise our belief
``learned loss function empowers the learning capability of neural networks for
image and video restoration''.
Concretely, we stand on the shoulders of the masked Autoencoders (MAE) and
formulate it as a `learned loss function', owing to the fact the pre-trained
MAE innately inherits the prior of image reasoning. We investigate the efficacy
of our belief from three perspectives: 1) from task-customized MAE to native
MAE, 2) from image task to video task, and 3) from transformer structure to
convolution neural network structure. Extensive experiments across multiple
image and video tasks, including image denoising, image super-resolution, image
enhancement, guided image super-resolution, video denoising, and video
enhancement, demonstrate the consistent performance improvements introduced by
the learned loss function. Besides, the learned loss function is preferable as
it can be directly plugged into existing networks during training without
involving computations in the inference stage. Code will be publicly available
Iterative Prompt Learning for Unsupervised Backlit Image Enhancement
We propose a novel unsupervised backlit image enhancement method, abbreviated
as CLIP-LIT, by exploring the potential of Contrastive Language-Image
Pre-Training (CLIP) for pixel-level image enhancement. We show that the
open-world CLIP prior not only aids in distinguishing between backlit and
well-lit images, but also in perceiving heterogeneous regions with different
luminance, facilitating the optimization of the enhancement network. Unlike
high-level and image manipulation tasks, directly applying CLIP to enhancement
tasks is non-trivial, owing to the difficulty in finding accurate prompts. To
solve this issue, we devise a prompt learning framework that first learns an
initial prompt pair by constraining the text-image similarity between the
prompt (negative/positive sample) and the corresponding image (backlit
image/well-lit image) in the CLIP latent space. Then, we train the enhancement
network based on the text-image similarity between the enhanced result and the
initial prompt pair. To further improve the accuracy of the initial prompt
pair, we iteratively fine-tune the prompt learning framework to reduce the
distribution gaps between the backlit images, enhanced results, and well-lit
images via rank learning, boosting the enhancement performance. Our method
alternates between updating the prompt learning framework and enhancement
network until visually pleasing results are achieved. Extensive experiments
demonstrate that our method outperforms state-of-the-art methods in terms of
visual quality and generalization ability, without requiring any paired data.Comment: Accepted to ICCV 2023 as Oral. Project page:
https://zhexinliang.github.io/CLIP_LIT_page
Adaptive Window Pruning for Efficient Local Motion Deblurring
Local motion blur commonly occurs in real-world photography due to the mixing
between moving objects and stationary backgrounds during exposure. Existing
image deblurring methods predominantly focus on global deblurring,
inadvertently affecting the sharpness of backgrounds in locally blurred images
and wasting unnecessary computation on sharp pixels, especially for
high-resolution images. This paper aims to adaptively and efficiently restore
high-resolution locally blurred images. We propose a local motion deblurring
vision Transformer (LMD-ViT) built on adaptive window pruning Transformer
blocks (AdaWPT). To focus deblurring on local regions and reduce computation,
AdaWPT prunes unnecessary windows, only allowing the active windows to be
involved in the deblurring processes. The pruning operation relies on the
blurriness confidence predicted by a confidence predictor that is trained
end-to-end using a reconstruction loss with Gumbel-Softmax re-parameterization
and a pruning loss guided by annotated blur masks. Our method removes local
motion blur effectively without distorting sharp regions, demonstrated by its
exceptional perceptual and quantitative improvements compared to
state-of-the-art methods. In addition, our approach substantially reduces FLOPs
by 66% and achieves more than a twofold increase in inference speed compared to
Transformer-based deblurring methods. We will make our code and annotated blur
masks publicly available.Comment: 17 page
- …