6,155 research outputs found
Enlighten-anything:When Segment Anything Model Meets Low-light Image Enhancement
Image restoration is a low-level visual task, and most CNN methods are
designed as black boxes, lacking transparency and intrinsic aesthetics. Many
unsupervised approaches ignore the degradation of visible information in
low-light scenes, which will seriously affect the aggregation of complementary
information and also make the fusion algorithm unable to produce satisfactory
fusion results under extreme conditions. In this paper, we propose
Enlighten-anything, which is able to enhance and fuse the semantic intent of
SAM segmentation with low-light images to obtain fused images with good visual
perception. The generalization ability of unsupervised learning is greatly
improved, and experiments on LOL dataset are conducted to show that our method
improves 3db in PSNR over baseline and 8 in SSIM. zero-shot learning of SAM
introduces a powerful aid for unsupervised low-light enhancement. The source
code of Rethink-Diffusion can be obtained from
https://github.com/zhangbaijin/enlighten-anythin
Iterative Prompt Learning for Unsupervised Backlit Image Enhancement
We propose a novel unsupervised backlit image enhancement method, abbreviated
as CLIP-LIT, by exploring the potential of Contrastive Language-Image
Pre-Training (CLIP) for pixel-level image enhancement. We show that the
open-world CLIP prior not only aids in distinguishing between backlit and
well-lit images, but also in perceiving heterogeneous regions with different
luminance, facilitating the optimization of the enhancement network. Unlike
high-level and image manipulation tasks, directly applying CLIP to enhancement
tasks is non-trivial, owing to the difficulty in finding accurate prompts. To
solve this issue, we devise a prompt learning framework that first learns an
initial prompt pair by constraining the text-image similarity between the
prompt (negative/positive sample) and the corresponding image (backlit
image/well-lit image) in the CLIP latent space. Then, we train the enhancement
network based on the text-image similarity between the enhanced result and the
initial prompt pair. To further improve the accuracy of the initial prompt
pair, we iteratively fine-tune the prompt learning framework to reduce the
distribution gaps between the backlit images, enhanced results, and well-lit
images via rank learning, boosting the enhancement performance. Our method
alternates between updating the prompt learning framework and enhancement
network until visually pleasing results are achieved. Extensive experiments
demonstrate that our method outperforms state-of-the-art methods in terms of
visual quality and generalization ability, without requiring any paired data.Comment: Accepted to ICCV 2023 as Oral. Project page:
https://zhexinliang.github.io/CLIP_LIT_page
Self-Reference Deep Adaptive Curve Estimation for Low-Light Image Enhancement
In this paper, we propose a 2-stage low-light image enhancement method called
Self-Reference Deep Adaptive Curve Estimation (Self-DACE). In the first stage,
we present an intuitive, lightweight, fast, and unsupervised luminance
enhancement algorithm. The algorithm is based on a novel low-light enhancement
curve that can be used to locally boost image brightness. We also propose a new
loss function with a simplified physical model designed to preserve natural
images' color, structure, and fidelity. We use a vanilla CNN to map each pixel
through deep Adaptive Adjustment Curves (AAC) while preserving the local image
structure. Secondly, we introduce the corresponding denoising scheme to remove
the latent noise in the darkness. We approximately model the noise in the dark
and deploy a Denoising-Net to estimate and remove the noise after the first
stage. Exhaustive qualitative and quantitative analysis shows that our method
outperforms existing state-of-the-art algorithms on multiple real-world
datasets
Low-Light Image and Video Enhancement: A Comprehensive Survey and Beyond
This paper presents a comprehensive survey of low-light image and video
enhancement. We begin with the challenging mixed over-/under-exposed images,
which are under-performed by existing methods. To this end, we propose two
variants of the SICE dataset named SICE_Grad and SICE_Mix. Next, we introduce
Night Wenzhou, a large-scale, high-resolution video dataset, to address the
issue of the lack of a low-light video dataset that discount the use of
low-light image enhancement (LLIE) to videos. Our Night Wenzhou dataset is
challenging since it consists of fast-moving aerial scenes and streetscapes
with varying illuminations and degradation. We conduct extensive key technique
analysis and experimental comparisons for representative LLIE approaches using
these newly proposed datasets and the current benchmark datasets. Finally, we
address unresolved issues and propose future research topics for the LLIE
community. Our datasets are available at
https://github.com/ShenZheng2000/LLIE_Survey.Comment: 13 pages, 8 tables, and 13 figure
Video Frame Interpolation via Adaptive Separable Convolution
Standard video frame interpolation methods first estimate optical flow
between input frames and then synthesize an intermediate frame guided by
motion. Recent approaches merge these two steps into a single convolution
process by convolving input frames with spatially adaptive kernels that account
for motion and re-sampling simultaneously. These methods require large kernels
to handle large motion, which limits the number of pixels whose kernels can be
estimated at once due to the large memory demand. To address this problem, this
paper formulates frame interpolation as local separable convolution over input
frames using pairs of 1D kernels. Compared to regular 2D kernels, the 1D
kernels require significantly fewer parameters to be estimated. Our method
develops a deep fully convolutional neural network that takes two input frames
and estimates pairs of 1D kernels for all pixels simultaneously. Since our
method is able to estimate kernels and synthesizes the whole video frame at
once, it allows for the incorporation of perceptual loss to train the neural
network to produce visually pleasing frames. This deep neural network is
trained end-to-end using widely available video data without any human
annotation. Both qualitative and quantitative experiments show that our method
provides a practical solution to high-quality video frame interpolation.Comment: ICCV 2017, http://graphics.cs.pdx.edu/project/sepconv
- …