45 research outputs found
Low-Light Image Enhancement with Wavelet-based Diffusion Models
Diffusion models have achieved promising results in image restoration tasks,
yet suffer from time-consuming, excessive computational resource consumption,
and unstable restoration. To address these issues, we propose a robust and
efficient Diffusion-based Low-Light image enhancement approach, dubbed DiffLL.
Specifically, we present a wavelet-based conditional diffusion model (WCDM)
that leverages the generative power of diffusion models to produce results with
satisfactory perceptual fidelity. Additionally, it also takes advantage of the
strengths of wavelet transformation to greatly accelerate inference and reduce
computational resource usage without sacrificing information. To avoid chaotic
content and diversity, we perform both forward diffusion and reverse denoising
in the training phase of WCDM, enabling the model to achieve stable denoising
and reduce randomness during inference. Moreover, we further design a
high-frequency restoration module (HFRM) that utilizes the vertical and
horizontal details of the image to complement the diagonal information for
better fine-grained restoration. Extensive experiments on publicly available
real-world benchmarks demonstrate that our method outperforms the existing
state-of-the-art methods both quantitatively and visually, and it achieves
remarkable improvements in efficiency compared to previous diffusion-based
methods. In addition, we empirically show that the application for low-light
face detection also reveals the latent practical values of our method
Realistic Noise Synthesis with Diffusion Models
Deep learning-based approaches have achieved remarkable performance in
single-image denoising. However, training denoising models typically requires a
large amount of data, which can be difficult to obtain in real-world scenarios.
Furthermore, synthetic noise used in the past has often produced significant
differences compared to real-world noise due to the complexity of the latter
and the poor modeling ability of noise distributions of Generative Adversarial
Network (GAN) models, resulting in residual noise and artifacts within
denoising models. To address these challenges, we propose a novel method for
synthesizing realistic noise using diffusion models. This approach enables us
to generate large amounts of high-quality data for training denoising models by
controlling camera settings to simulate different environmental conditions and
employing guided multi-scale content information to ensure that our method is
more capable of generating real noise with multi-frequency spatial
correlations. In particular, we design an inversion mechanism for the setting,
which extends our method to more public datasets without setting information.
Based on the noise dataset we synthesized, we have conducted sufficient
experiments on multiple benchmarks, and experimental results demonstrate that
our method outperforms state-of-the-art methods on multiple benchmarks and
metrics, demonstrating its effectiveness in synthesizing realistic noise for
training denoising models
Supervised Homography Learning with Realistic Dataset Generation
In this paper, we propose an iterative framework, which consists of two
phases: a generation phase and a training phase, to generate realistic training
data and yield a supervised homography network. In the generation phase, given
an unlabeled image pair, we utilize the pre-estimated dominant plane masks and
homography of the pair, along with another sampled homography that serves as
ground truth to generate a new labeled training pair with realistic motion. In
the training phase, the generated data is used to train the supervised
homography network, in which the training data is refined via a content
consistency module and a quality assessment module. Once an iteration is
finished, the trained network is used in the next data generation phase to
update the pre-estimated homography. Through such an iterative strategy, the
quality of the dataset and the performance of the network can be gradually and
simultaneously improved. Experimental results show that our method achieves
state-of-the-art performance and existing supervised methods can be also
improved based on the generated dataset. Code and dataset are available at
https://github.com/megvii-research/RealSH.Comment: Accepted by ICCV 202
RealFlow: EM-based Realistic Optical Flow Dataset Generation from Videos
Obtaining the ground truth labels from a video is challenging since the
manual annotation of pixel-wise flow labels is prohibitively expensive and
laborious. Besides, existing approaches try to adapt the trained model on
synthetic datasets to authentic videos, which inevitably suffers from domain
discrepancy and hinders the performance for real-world applications. To solve
these problems, we propose RealFlow, an Expectation-Maximization based
framework that can create large-scale optical flow datasets directly from any
unlabeled realistic videos. Specifically, we first estimate optical flow
between a pair of video frames, and then synthesize a new image from this pair
based on the predicted flow. Thus the new image pairs and their corresponding
flows can be regarded as a new training set. Besides, we design a Realistic
Image Pair Rendering (RIPR) module that adopts softmax splatting and
bi-directional hole filling techniques to alleviate the artifacts of the image
synthesis. In the E-step, RIPR renders new images to create a large quantity of
training data. In the M-step, we utilize the generated training data to train
an optical flow network, which can be used to estimate optical flows in the
next E-step. During the iterative learning steps, the capability of the flow
network is gradually improved, so is the accuracy of the flow, as well as the
quality of the synthesized dataset. Experimental results show that RealFlow
outperforms previous dataset generation methods by a considerably large margin.
Moreover, based on the generated dataset, our approach achieves
state-of-the-art performance on two standard benchmarks compared with both
supervised and unsupervised optical flow methods. Our code and dataset are
available at https://github.com/megvii-research/RealFlowComment: ECCV 2022 Ora
DiPrompT: Disentangled Prompt Tuning for Multiple Latent Domain Generalization in Federated Learning
Federated learning (FL) has emerged as a powerful paradigm for learning from
decentralized data, and federated domain generalization further considers the
test dataset (target domain) is absent from the decentralized training data
(source domains). However, most existing FL methods assume that domain labels
are provided during training, and their evaluation imposes explicit constraints
on the number of domains, which must strictly match the number of clients.
Because of the underutilization of numerous edge devices and additional
cross-client domain annotations in the real world, such restrictions may be
impractical and involve potential privacy leaks. In this paper, we propose an
efficient and novel approach, called Disentangled Prompt Tuning (DiPrompT), a
method that tackles the above restrictions by learning adaptive prompts for
domain generalization in a distributed manner. Specifically, we first design
two types of prompts, i.e., global prompt to capture general knowledge across
all clients and domain prompts to capture domain-specific knowledge. They
eliminate the restriction on the one-to-one mapping between source domains and
local clients. Furthermore, a dynamic query metric is introduced to
automatically search the suitable domain label for each sample, which includes
two-substep text-image alignments based on prompt tuning without
labor-intensive annotation. Extensive experiments on multiple datasets
demonstrate that our DiPrompT achieves superior domain generalization
performance over state-of-the-art FL methods when domain labels are not
provided, and even outperforms many centralized learning methods using domain
labels
GLARE: Low Light Image Enhancement via Generative Latent Feature based Codebook Retrieval
Most existing Low-light Image Enhancement (LLIE) methods either directly map Low-Light (LL) to Normal-Light (NL) images or use semantic or illumination maps as guides. However, the ill-posed nature of LLIE and the difficulty of semantic retrieval from impaired inputs limit these methods, especially in extremely low-light conditions. To address this issue, we present a new LLIE network via Generative LAtent feature based codebook REtrieval (GLARE), in which the codebook prior is derived from undegraded NL images using a Vector Quantization (VQ) strategy. More importantly, we develop a generative Invertible Latent Normalizing Flow (I-LNF) module to align the LL feature distribution to NL latent representations, guaranteeing the correct code retrieval in the codebook. In addition, a novel Adaptive Feature Transformation (AFT) module, featuring an adjustable function for users and comprising an Adaptive Mix-up Block (AMB) along with a dual-decoder architecture, is devised to further enhance fidelity while preserving the realistic details provided by codebook prior. Extensive experiments confirm the superior performance of GLARE on various benchmark datasets and real-world data. Its effectiveness as a preprocessing tool in low-light object detection tasks further validates GLARE for high-level vision applications. Code is released at https://github.com/LowLevelAI/GLARE.Accepted by ECCV 202
NTIRE 2022 Challenge on High Dynamic Range Imaging:Methods and Results
This paper reviews the challenge on constrained high dynamic range (HDR) imaging that was part of the New Trends in Image Restoration and Enhancement (NTIRE) workshop, held in conjunction with CVPR 2022. This manuscript focuses on the competition set-up, datasets, the proposed methods and their results. The challenge aims at estimating an HDR image from multiple respective low dynamic range (LDR) observations, which might suffer from under-or over-exposed regions and different sources of noise. The challenge is composed of two tracks with an emphasis on fidelity and complexity constraints: In Track 1, participants are asked to optimize objective fidelity scores while imposing a low-complexity constraint (i.e. solutions can not exceed a given number of operations). In Track 2, participants are asked to minimize the complexity of their solutions while imposing a constraint on fidelity scores (i.e. solutions are required to obtain a higher fidelity score than the prescribed baseline). Both tracks use the same data and metrics: Fidelity is measured by means of PSNR with respect to a ground-truth HDR image (computed both directly and with a canonical tonemapping operation), while complexity metrics include the number of Multiply-Accumulate (MAC) operations and runtime (in seconds).</p
PtSe<sub>2</sub>/SiH van der Waals type-II heterostructure: a high efficiency photocatalyst for water splitting
PtSe2/SiH type-II van der Waals heterostructure is a highly efficient photocatalyst for water splitting in visible light.</p
AlAs/SiH van der Waals heterostructures: A promising photocatalyst for water splitting
Study of the GaAs/SiH van der Waals type-II heterostructure: a high efficiency photocatalyst promoted by a built-in electric field
The built-in electric field promotes GaAs/SiH as a high efficiency photocatalyst for water splitting in visible light.</p
