8 research outputs found
A Data-Centric Solution to NonHomogeneous Dehazing via Vision Transformer
Recent years have witnessed an increased interest in image dehazing. Many
deep learning methods have been proposed to tackle this challenge, and have
made significant accomplishments dealing with homogeneous haze. However, these
solutions cannot maintain comparable performance when they are applied to
images with non-homogeneous haze, e.g., NH-HAZE23 dataset introduced by NTIRE
challenges. One of the reasons for such failures is that non-homogeneous haze
does not obey one of the assumptions that is required for modeling homogeneous
haze. In addition, a large number of pairs of non-homogeneous hazy image and
the clean counterpart is required using traditional end-to-end training
approaches, while NH-HAZE23 dataset is of limited quantities. Although it is
possible to augment the NH-HAZE23 dataset by leveraging other non-homogeneous
dehazing datasets, we observe that it is necessary to design a proper
data-preprocessing approach that reduces the distribution gaps between the
target dataset and the augmented one. This finding indeed aligns with the
essence of data-centric AI. With a novel network architecture and a principled
data-preprocessing approach that systematically enhances data quality, we
present an innovative dehazing method. Specifically, we apply RGB-channel-wise
transformations on the augmented datasets, and incorporate the state-of-the-art
transformers as the backbone in the two-branch framework. We conduct extensive
experiments and ablation study to demonstrate the effectiveness of our proposed
method.Comment: Accepted by CVPRW 202
Breaking Through the Haze: An Advanced Non-Homogeneous Dehazing Method based on Fast Fourier Convolution and ConvNeXt
Haze usually leads to deteriorated images with low contrast, color shift and
structural distortion. We observe that many deep learning based models exhibit
exceptional performance on removing homogeneous haze, but they usually fail to
address the challenge of non-homogeneous dehazing. Two main factors account for
this situation. Firstly, due to the intricate and non uniform distribution of
dense haze, the recovery of structural and chromatic features with high
fidelity is challenging, particularly in regions with heavy haze. Secondly, the
existing small scale datasets for non-homogeneous dehazing are inadequate to
support reliable learning of feature mappings between hazy images and their
corresponding haze-free counterparts by convolutional neural network
(CNN)-based models. To tackle these two challenges, we propose a novel two
branch network that leverages 2D discrete wavelete transform (DWT), fast
Fourier convolution (FFC) residual block and a pretrained ConvNeXt model.
Specifically, in the DWT-FFC frequency branch, our model exploits DWT to
capture more high-frequency features. Moreover, by taking advantage of the
large receptive field provided by FFC residual blocks, our model is able to
effectively explore global contextual information and produce images with
better perceptual quality. In the prior knowledge branch, an ImageNet
pretrained ConvNeXt as opposed to Res2Net is adopted. This enables our model to
learn more supplementary information and acquire a stronger generalization
ability. The feasibility and effectiveness of the proposed method is
demonstrated via extensive experiments and ablation studies. The code is
available at https://github.com/zhouh115/DWT-FFC.Comment: Accepted by CVPRW 202
PromptIR: Prompting for All-in-One Blind Image Restoration
Image restoration involves recovering a high-quality clean image from its
degraded version. Deep learning-based methods have significantly improved image
restoration performance, however, they have limited generalization ability to
different degradation types and levels. This restricts their real-world
application since it requires training individual models for each specific
degradation and knowing the input degradation type to apply the relevant model.
We present a prompt-based learning approach, PromptIR, for All-In-One image
restoration that can effectively restore images from various types and levels
of degradation. In particular, our method uses prompts to encode
degradation-specific information, which is then used to dynamically guide the
restoration network. This allows our method to generalize to different
degradation types and levels, while still achieving state-of-the-art results on
image denoising, deraining, and dehazing. Overall, PromptIR offers a generic
and efficient plugin module with few lightweight prompts that can be used to
restore images of various types and levels of degradation with no prior
information on the corruptions present in the image. Our code and pretrained
models are available here: https://github.com/va1shn9v/PromptI
MB-TaylorFormer: Multi-branch Efficient Transformer Expanded by Taylor Formula for Image Dehazing
In recent years, Transformer networks are beginning to replace pure
convolutional neural networks (CNNs) in the field of computer vision due to
their global receptive field and adaptability to input. However, the quadratic
computational complexity of softmax-attention limits the wide application in
image dehazing task, especially for high-resolution images. To address this
issue, we propose a new Transformer variant, which applies the Taylor expansion
to approximate the softmax-attention and achieves linear computational
complexity. A multi-scale attention refinement module is proposed as a
complement to correct the error of the Taylor expansion. Furthermore, we
introduce a multi-branch architecture with multi-scale patch embedding to the
proposed Transformer, which embeds features by overlapping deformable
convolution of different scales. The design of multi-scale patch embedding is
based on three key ideas: 1) various sizes of the receptive field; 2)
multi-level semantic information; 3) flexible shapes of the receptive field.
Our model, named Multi-branch Transformer expanded by Taylor formula
(MB-TaylorFormer), can embed coarse to fine features more flexibly at the patch
embedding stage and capture long-distance pixel interactions with limited
computational cost. Experimental results on several dehazing benchmarks show
that MB-TaylorFormer achieves state-of-the-art (SOTA) performance with a light
computational burden. The source code and pre-trained models are available at
https://github.com/FVL2020/ICCV-2023-MB-TaylorFormer.Comment: ICCV 202
Transformer-based progressive residual network for single image dehazing
IntroductionThe seriously degraded fogging image affects the further visual tasks. How to obtain a fog-free image is not only challenging, but also important in computer vision. Recently, the vision transformer (ViT) architecture has achieved very efficient performance in several vision areas.MethodsIn this paper, we propose a new transformer-based progressive residual network. Different from the existing single-stage ViT architecture, we recursively call the progressive residual network with the introduction of swin transformer. Specifically, our progressive residual network consists of three main components: the recurrent block, the transformer codecs and the supervise fusion module. First, the recursive block learns the features of the input image, while connecting the original image features of the original iteration. Then, the encoder introduces the swin transformer block to encode the feature representation of the decomposed block, and continuously reduces the feature mapping resolution to extract remote context features. The decoder recursively selects and fuses image features by combining attention mechanism and dense residual blocks. In addition, we add a channel attention mechanism between codecs to focus on the importance of different features.Results and discussionThe experimental results show that the performance of this method outperforms state-of-the-art handcrafted and learning-based methods
Synthetic image generation and the use of virtual environments for image enhancement tasks
Deep learning networks are often difficult to train if there are insufficient image samples. Gathering real-world images tailored for a specific job takes a lot of work to perform. This dissertation explores techniques for synthetic image generation and virtual environments for various image enhancement/ correction/restoration tasks, specifically distortion correction, dehazing, shadow removal, and intrinsic image decomposition. First, given various image formation equations, such as those used in distortion correction and dehazing, synthetic image samples can be produced, provided that the equation is well-posed. Second, using virtual environments to train various image models is applicable for simulating real-world effects that are otherwise difficult to gather or replicate, such as dehazing and shadow removal. Given synthetic images, one cannot train a network directly on it as there is a possible gap between the synthetic and real domains. We have devised several techniques for generating synthetic images and formulated domain adaptation methods where our trained deep-learning networks perform competitively in distortion correction, dehazing, and shadow removal. Additional studies and directions are provided for the intrinsic image decomposition problem and the exploration of procedural content generation, where a virtual Philippine city was created as an initial prototype.
Keywords: image generation, image correction, image dehazing, shadow removal, intrinsic image decomposition, computer graphics, rendering, machine learning, neural networks, domain adaptation, procedural content generation