56 research outputs found
Low-Light Image Enhancement with Wavelet-based Diffusion Models
Diffusion models have achieved promising results in image restoration tasks,
yet suffer from time-consuming, excessive computational resource consumption,
and unstable restoration. To address these issues, we propose a robust and
efficient Diffusion-based Low-Light image enhancement approach, dubbed DiffLL.
Specifically, we present a wavelet-based conditional diffusion model (WCDM)
that leverages the generative power of diffusion models to produce results with
satisfactory perceptual fidelity. Additionally, it also takes advantage of the
strengths of wavelet transformation to greatly accelerate inference and reduce
computational resource usage without sacrificing information. To avoid chaotic
content and diversity, we perform both forward diffusion and reverse denoising
in the training phase of WCDM, enabling the model to achieve stable denoising
and reduce randomness during inference. Moreover, we further design a
high-frequency restoration module (HFRM) that utilizes the vertical and
horizontal details of the image to complement the diagonal information for
better fine-grained restoration. Extensive experiments on publicly available
real-world benchmarks demonstrate that our method outperforms the existing
state-of-the-art methods both quantitatively and visually, and it achieves
remarkable improvements in efficiency compared to previous diffusion-based
methods. In addition, we empirically show that the application for low-light
face detection also reveals the latent practical values of our method
GAFlow: Incorporating Gaussian Attention into Optical Flow
Optical flow, or the estimation of motion fields from image sequences, is one
of the fundamental problems in computer vision. Unlike most pixel-wise tasks
that aim at achieving consistent representations of the same category, optical
flow raises extra demands for obtaining local discrimination and smoothness,
which yet is not fully explored by existing approaches. In this paper, we push
Gaussian Attention (GA) into the optical flow models to accentuate local
properties during representation learning and enforce the motion affinity
during matching. Specifically, we introduce a novel Gaussian-Constrained Layer
(GCL) which can be easily plugged into existing Transformer blocks to highlight
the local neighborhood that contains fine-grained structural information.
Moreover, for reliable motion analysis, we provide a new Gaussian-Guided
Attention Module (GGAM) which not only inherits properties from Gaussian
distribution to instinctively revolve around the neighbor fields of each point
but also is empowered to put the emphasis on contextually related regions
during matching. Our fully-equipped model, namely Gaussian Attention Flow
network (GAFlow), naturally incorporates a series of novel Gaussian-based
modules into the conventional optical flow framework for reliable motion
analysis. Extensive experiments on standard optical flow datasets consistently
demonstrate the exceptional performance of the proposed approach in terms of
both generalization ability evaluation and online benchmark testing. Code is
available at https://github.com/LA30/GAFlow.Comment: To appear in ICCV-202
Supervised Homography Learning with Realistic Dataset Generation
In this paper, we propose an iterative framework, which consists of two
phases: a generation phase and a training phase, to generate realistic training
data and yield a supervised homography network. In the generation phase, given
an unlabeled image pair, we utilize the pre-estimated dominant plane masks and
homography of the pair, along with another sampled homography that serves as
ground truth to generate a new labeled training pair with realistic motion. In
the training phase, the generated data is used to train the supervised
homography network, in which the training data is refined via a content
consistency module and a quality assessment module. Once an iteration is
finished, the trained network is used in the next data generation phase to
update the pre-estimated homography. Through such an iterative strategy, the
quality of the dataset and the performance of the network can be gradually and
simultaneously improved. Experimental results show that our method achieves
state-of-the-art performance and existing supervised methods can be also
improved based on the generated dataset. Code and dataset are available at
https://github.com/megvii-research/RealSH.Comment: Accepted by ICCV 202
Implicit Identity Leakage: The Stumbling Block to Improving Deepfake Detection Generalization
In this paper, we analyse the generalization ability of binary classifiers
for the task of deepfake detection. We find that the stumbling block to their
generalization is caused by the unexpected learned identity representation on
images. Termed as the Implicit Identity Leakage, this phenomenon has been
qualitatively and quantitatively verified among various DNNs. Furthermore,
based on such understanding, we propose a simple yet effective method named the
ID-unaware Deepfake Detection Model to reduce the influence of this phenomenon.
Extensive experimental results demonstrate that our method outperforms the
state-of-the-art in both in-dataset and cross-dataset evaluation. The code is
available at https://github.com/megvii-research/CADDM.Comment: Accepted by CVPR 202
- …