356 research outputs found
Low-Light Image Enhancement with Wavelet-based Diffusion Models
Diffusion models have achieved promising results in image restoration tasks,
yet suffer from time-consuming, excessive computational resource consumption,
and unstable restoration. To address these issues, we propose a robust and
efficient Diffusion-based Low-Light image enhancement approach, dubbed DiffLL.
Specifically, we present a wavelet-based conditional diffusion model (WCDM)
that leverages the generative power of diffusion models to produce results with
satisfactory perceptual fidelity. Additionally, it also takes advantage of the
strengths of wavelet transformation to greatly accelerate inference and reduce
computational resource usage without sacrificing information. To avoid chaotic
content and diversity, we perform both forward diffusion and reverse denoising
in the training phase of WCDM, enabling the model to achieve stable denoising
and reduce randomness during inference. Moreover, we further design a
high-frequency restoration module (HFRM) that utilizes the vertical and
horizontal details of the image to complement the diagonal information for
better fine-grained restoration. Extensive experiments on publicly available
real-world benchmarks demonstrate that our method outperforms the existing
state-of-the-art methods both quantitatively and visually, and it achieves
remarkable improvements in efficiency compared to previous diffusion-based
methods. In addition, we empirically show that the application for low-light
face detection also reveals the latent practical values of our method
Unlocking Low-Light-Rainy Image Restoration by Pairwise Degradation Feature Vector Guidance
Rain in the dark is a common natural phenomenon. Photos captured in such a
condition significantly impact the performance of various nighttime activities,
such as autonomous driving, surveillance systems, and night photography. While
existing methods designed for low-light enhancement or deraining show promising
performance, they have limitations in simultaneously addressing the task of
brightening low light and removing rain. Furthermore, using a cascade approach,
such as ``deraining followed by low-light enhancement'' or vice versa, may lead
to difficult-to-handle rain patterns or excessively blurred and overexposed
images. To overcome these limitations, we propose an end-to-end network called
which can jointly handle low-light enhancement and deraining. Our
network mainly includes a Pairwise Degradation Feature Vector Extraction
Network (P-Net) and a Restoration Network (R-Net). P-Net can learn degradation
feature vectors on the dark and light areas separately, using contrastive
learning to guide the image restoration process. The R-Net is responsible for
restoring the image. We also introduce an effective Fast Fourier - ResNet
Detail Guidance Module (FFR-DG) that initially guides image restoration using
detail image that do not contain degradation information but focus on texture
detail information. Additionally, we contribute a dataset containing synthetic
and real-world low-light-rainy images. Extensive experiments demonstrate that
our outperforms existing methods in both synthetic and complex
real-world scenarios
DEFormer: DCT-driven Enhancement Transformer for Low-light Image and Dark Vision
The goal of low-light image enhancement is to restore the color and details
of the image and is of great significance for high-level visual tasks in
autonomous driving. However, it is difficult to restore the lost details in the
dark area by relying only on the RGB domain. In this paper we introduce
frequency as a new clue into the network and propose a novel DCT-driven
enhancement transformer (DEFormer). First, we propose a learnable frequency
branch (LFB) for frequency enhancement contains DCT processing and
curvature-based frequency enhancement (CFE). CFE calculates the curvature of
each channel to represent the detail richness of different frequency bands,
then we divides the frequency features, which focuses on frequency bands with
richer textures. In addition, we propose a cross domain fusion (CDF) for
reducing the differences between the RGB domain and the frequency domain. We
also adopt DEFormer as a preprocessing in dark detection, DEFormer effectively
improves the performance of the detector, bringing 2.1% and 3.4% improvement in
ExDark and DARK FACE datasets on mAP respectively.Comment: submit to ICRA202
FastLLVE: Real-Time Low-Light Video Enhancement with Intensity-Aware Lookup Table
Low-Light Video Enhancement (LLVE) has received considerable attention in
recent years. One of the critical requirements of LLVE is inter-frame
brightness consistency, which is essential for maintaining the temporal
coherence of the enhanced video. However, most existing single-image-based
methods fail to address this issue, resulting in flickering effect that
degrades the overall quality after enhancement. Moreover, 3D Convolution Neural
Network (CNN)-based methods, which are designed for video to maintain
inter-frame consistency, are computationally expensive, making them impractical
for real-time applications. To address these issues, we propose an efficient
pipeline named FastLLVE that leverages the Look-Up-Table (LUT) technique to
maintain inter-frame brightness consistency effectively. Specifically, we
design a learnable Intensity-Aware LUT (IA-LUT) module for adaptive
enhancement, which addresses the low-dynamic problem in low-light scenarios.
This enables FastLLVE to perform low-latency and low-complexity enhancement
operations while maintaining high-quality results. Experimental results on
benchmark datasets demonstrate that our method achieves the State-Of-The-Art
(SOTA) performance in terms of both image quality and inter-frame brightness
consistency. More importantly, our FastLLVE can process 1,080p videos at
Frames Per Second (FPS), which is faster
than SOTA CNN-based methods in inference time, making it a promising solution
for real-time applications. The code is available at
https://github.com/Wenhao-Li-777/FastLLVE.Comment: 11pages, 9 Figures, and 6 Tables. Accepted by ACMMM 202
Fearless Luminance Adaptation: A Macro-Micro-Hierarchical Transformer for Exposure Correction
Photographs taken with less-than-ideal exposure settings often display poor
visual quality. Since the correction procedures vary significantly, it is
difficult for a single neural network to handle all exposure problems.
Moreover, the inherent limitations of convolutions, hinder the models ability
to restore faithful color or details on extremely over-/under- exposed regions.
To overcome these limitations, we propose a Macro-Micro-Hierarchical
transformer, which consists of a macro attention to capture long-range
dependencies, a micro attention to extract local features, and a hierarchical
structure for coarse-to-fine correction. In specific, the complementary
macro-micro attention designs enhance locality while allowing global
interactions. The hierarchical structure enables the network to correct
exposure errors of different scales layer by layer. Furthermore, we propose a
contrast constraint and couple it seamlessly in the loss function, where the
corrected image is pulled towards the positive sample and pushed away from the
dynamically generated negative samples. Thus the remaining color distortion and
loss of detail can be removed. We also extend our method as an image enhancer
for low-light face recognition and low-light semantic segmentation. Experiments
demonstrate that our approach obtains more attractive results than
state-of-the-art methods quantitatively and qualitatively.Comment: Accepted by ACM MM 202
Advancing Perception in Artificial Intelligence through Principles of Cognitive Science
Although artificial intelligence (AI) has achieved many feats at a rapid
pace, there still exist open problems and fundamental shortcomings related to
performance and resource efficiency. Since AI researchers benchmark a
significant proportion of performance standards through human intelligence,
cognitive sciences-inspired AI is a promising domain of research. Studying
cognitive science can provide a fresh perspective to building fundamental
blocks in AI research, which can lead to improved performance and efficiency.
In this review paper, we focus on the cognitive functions of perception, which
is the process of taking signals from one's surroundings as input, and
processing them to understand the environment. Particularly, we study and
compare its various processes through the lens of both cognitive sciences and
AI. Through this study, we review all current major theories from various
sub-disciplines of cognitive science (specifically neuroscience, psychology and
linguistics), and draw parallels with theories and techniques from current
practices in AI. We, hence, present a detailed collection of methods in AI for
researchers to build AI systems inspired by cognitive science. Further, through
the process of reviewing the state of cognitive-inspired AI, we point out many
gaps in the current state of AI (with respect to the performance of the human
brain), and hence present potential directions for researchers to develop
better perception systems in AI.Comment: Summary: a detailed review of the current state of perception models
through the lens of cognitive A
ExposureDiffusion: Learning to Expose for Low-light Image Enhancement
Previous raw image-based low-light image enhancement methods predominantly
relied on feed-forward neural networks to learn deterministic mappings from
low-light to normally-exposed images. However, they failed to capture critical
distribution information, leading to visually undesirable results. This work
addresses the issue by seamlessly integrating a diffusion model with a
physics-based exposure model. Different from a vanilla diffusion model that has
to perform Gaussian denoising, with the injected physics-based exposure model,
our restoration process can directly start from a noisy image instead of pure
noise. As such, our method obtains significantly improved performance and
reduced inference time compared with vanilla diffusion models. To make full use
of the advantages of different intermediate steps, we further propose an
adaptive residual layer that effectively screens out the side-effect in the
iterative refinement when the intermediate results have been already
well-exposed. The proposed framework can work with both real-paired datasets,
SOTA noise models, and different backbone networks. Note that, the proposed
framework is compatible with real-paired datasets, real/synthetic noise models,
and different backbone networks. We evaluate the proposed method on various
public benchmarks, achieving promising results with consistent improvements
using different exposure models and backbones. Besides, the proposed method
achieves better generalization capacity for unseen amplifying ratios and better
performance than a larger feedforward neural model when few parameters are
adopted.Comment: accepted by ICCV202
Implicit Neural Representation for Cooperative Low-light Image Enhancement
The following three factors restrict the application of existing low-light
image enhancement methods: unpredictable brightness degradation and noise,
inherent gap between metric-favorable and visual-friendly versions, and the
limited paired training data. To address these limitations, we propose an
implicit Neural Representation method for Cooperative low-light image
enhancement, dubbed NeRCo. It robustly recovers perceptual-friendly results in
an unsupervised manner. Concretely, NeRCo unifies the diverse degradation
factors of real-world scenes with a controllable fitting function, leading to
better robustness. In addition, for the output results, we introduce
semantic-orientated supervision with priors from the pre-trained
vision-language model. Instead of merely following reference images, it
encourages results to meet subjective expectations, finding more
visual-friendly solutions. Further, to ease the reliance on paired data and
reduce solution space, we develop a dual-closed-loop constrained enhancement
module. It is trained cooperatively with other affiliated modules in a
self-supervised manner. Finally, extensive experiments demonstrate the
robustness and superior effectiveness of our proposed NeRCo. Our code is
available at https://github.com/Ysz2022/NeRCo
Logarithmic Mathematical Morphology: theory and applications
Classically, in Mathematical Morphology, an image (i.e., a grey-level
function) is analysed by another image which is named the structuring element
or the structuring function. This structuring function is moved over the image
domain and summed to the image. However, in an image presenting lighting
variations, the analysis by a structuring function should require that its
amplitude varies according to the image intensity. Such a property is not
verified in Mathematical Morphology for grey level functions, when the
structuring function is summed to the image with the usual additive law. In
order to address this issue, a new framework is defined with an additive law
for which the amplitude of the structuring function varies according to the
image amplitude. This additive law is chosen within the Logarithmic Image
Processing framework and models the lighting variations with a physical cause
such as a change of light intensity or a change of camera exposure-time. The
new framework is named Logarithmic Mathematical Morphology (LMM) and allows the
definition of operators which are robust to such lighting variations. In images
with uniform lighting variations, those new LMM operators perform better than
usual morphological operators. In eye-fundus images with non-uniform lighting
variations, a LMM method for vessel segmentation is compared to three
state-of-the-art approaches. Results show that the LMM approach has a better
robustness to such variations than the three others
Residual Denoising Diffusion Models
We propose residual denoising diffusion models (RDDM), a novel dual diffusion
process that decouples the traditional single denoising diffusion process into
residual diffusion and noise diffusion. This dual diffusion framework expands
the denoising-based diffusion models, initially uninterpretable for image
restoration, into a unified and interpretable model for both image generation
and restoration by introducing residuals. Specifically, our residual diffusion
represents directional diffusion from the target image to the degraded input
image and explicitly guides the reverse generation process for image
restoration, while noise diffusion represents random perturbations in the
diffusion process. The residual prioritizes certainty, while the noise
emphasizes diversity, enabling RDDM to effectively unify tasks with varying
certainty or diversity requirements, such as image generation and restoration.
We demonstrate that our sampling process is consistent with that of DDPM and
DDIM through coefficient transformation, and propose a partially
path-independent generation process to better understand the reverse process.
Notably, our RDDM enables a generic UNet, trained with only an loss
and a batch size of 1, to compete with state-of-the-art image restoration
methods. We provide code and pre-trained models to encourage further
exploration, application, and development of our innovative framework
(https://github.com/nachifur/RDDM)
- …