7 research outputs found
Model Adaptation with Synthetic and Real Data for Semantic Dense Foggy Scene Understanding
This work addresses the problem of semantic scene understanding under dense
fog. Although considerable progress has been made in semantic scene
understanding, it is mainly related to clear-weather scenes. Extending
recognition methods to adverse weather conditions such as fog is crucial for
outdoor applications. In this paper, we propose a novel method, named
Curriculum Model Adaptation (CMAda), which gradually adapts a semantic
segmentation model from light synthetic fog to dense real fog in multiple
steps, using both synthetic and real foggy data. In addition, we present three
other main stand-alone contributions: 1) a novel method to add synthetic fog to
real, clear-weather scenes using semantic input; 2) a new fog density
estimator; 3) the Foggy Zurich dataset comprising real foggy images,
with pixel-level semantic annotations for images with dense fog. Our
experiments show that 1) our fog simulation slightly outperforms a
state-of-the-art competing simulation with respect to the task of semantic
foggy scene understanding (SFSU); 2) CMAda improves the performance of
state-of-the-art models for SFSU significantly by leveraging unlabeled real
foggy data. The datasets and code are publicly available.Comment: final version, ECCV 201
Uni-Removal: A Semi-Supervised Framework for Simultaneously Addressing Multiple Degradations in Real-World Images
Removing multiple degradations, such as haze, rain, and blur, from real-world
images poses a challenging and illposed problem. Recently, unified models that
can handle different degradations have been proposed and yield promising
results. However, these approaches focus on synthetic images and experience a
significant performance drop when applied to realworld images. In this paper,
we introduce Uni-Removal, a twostage semi-supervised framework for addressing
the removal of multiple degradations in real-world images using a unified model
and parameters. In the knowledge transfer stage, Uni-Removal leverages a
supervised multi-teacher and student architecture in the knowledge transfer
stage to facilitate learning from pretrained teacher networks specialized in
different degradation types. A multi-grained contrastive loss is introduced to
enhance learning from feature and image spaces. In the domain adaptation stage,
unsupervised fine-tuning is performed by incorporating an adversarial
discriminator on real-world images. The integration of an extended
multi-grained contrastive loss and generative adversarial loss enables the
adaptation of the student network from synthetic to real-world domains.
Extensive experiments on real-world degraded datasets demonstrate the
effectiveness of our proposed method. We compare our Uni-Removal framework with
state-of-the-art supervised and unsupervised methods, showcasing its promising
results in real-world image dehazing, deraining, and deblurring simultaneously
Streamlined Global and Local Features Combinator (SGLC) for High Resolution Image Dehazing
Image Dehazing aims to remove atmospheric fog or haze from an image. Although
the Dehazing models have evolved a lot in recent years, few have precisely
tackled the problem of High-Resolution hazy images. For this kind of image, the
model needs to work on a downscaled version of the image or on cropped patches
from it. In both cases, the accuracy will drop. This is primarily due to the
inherent failure to combine global and local features when the image size
increases. The Dehazing model requires global features to understand the
general scene peculiarities and the local features to work better with fine and
pixel details. In this study, we propose the Streamlined Global and Local
Features Combinator (SGLC) to solve these issues and to optimize the
application of any Dehazing model to High-Resolution images. The SGLC contains
two successive blocks. The first is the Global Features Generator (GFG) which
generates the first version of the Dehazed image containing strong global
features. The second block is the Local Features Enhancer (LFE) which improves
the local feature details inside the previously generated image. When tested on
the Uformer architecture for Dehazing, SGLC increased the PSNR metric by a
significant margin. Any other model can be incorporated inside the SGLC process
to improve its efficiency on High-Resolution input data.Comment: Accepted in CVPR 2023 Workshop
Source-Free Domain Adaptation for Real-world Image Dehazing
Deep learning-based source dehazing methods trained on synthetic datasets
have achieved remarkable performance but suffer from dramatic performance
degradation on real hazy images due to domain shift. Although certain Domain
Adaptation (DA) dehazing methods have been presented, they inevitably require
access to the source dataset to reduce the gap between the source synthetic and
target real domains. To address these issues, we present a novel Source-Free
Unsupervised Domain Adaptation (SFUDA) image dehazing paradigm, in which only a
well-trained source model and an unlabeled target real hazy dataset are
available. Specifically, we devise the Domain Representation Normalization
(DRN) module to make the representation of real hazy domain features match that
of the synthetic domain to bridge the gaps. With our plug-and-play DRN module,
unlabeled real hazy images can adapt existing well-trained source networks.
Besides, the unsupervised losses are applied to guide the learning of the DRN
module, which consists of frequency losses and physical prior losses. Frequency
losses provide structure and style constraints, while the prior loss explores
the inherent statistic property of haze-free images. Equipped with our DRN
module and unsupervised loss, existing source dehazing models are able to
dehaze unlabeled real hazy images. Extensive experiments on multiple baselines
demonstrate the validity and superiority of our method visually and
quantitatively.Comment: Accepted to ACM MM 202
LW-ISP: A Lightweight Model with ISP and Deep Learning
The deep learning (DL)-based methods of low-level tasks have many advantages
over the traditional camera in terms of hardware prospects, error accumulation
and imaging effects. Recently, the application of deep learning to replace the
image signal processing (ISP) pipeline has appeared one after another; however,
there is still a long way to go towards real landing. In this paper, we show
the possibility of learning-based method to achieve real-time high-performance
processing in the ISP pipeline. We propose LW-ISP, a novel architecture
designed to implicitly learn the image mapping from RAW data to RGB image.
Based on U-Net architecture, we propose the fine-grained attention module and a
plug-and-play upsampling block suitable for low-level tasks. In particular, we
design a heterogeneous distillation algorithm to distill the implicit features
and reconstruction information of the clean image, so as to guide the learning
of the student model. Our experiments demonstrate that LW-ISP has achieved a
0.38 dB improvement in PSNR compared to the previous best method, while the
model parameters and calculation have been reduced by 23 times and 81 times.
The inference efficiency has been accelerated by at least 15 times. Without
bells and whistles, LW-ISP has achieved quite competitive results in ISP
subtasks including image denoising and enhancement.Comment: 16 PAGES, ACCEPTED AS A CONFERENCE PAPER AT: BMVC 202
Learning to Distill Global Representation for Sparse-View CT
Sparse-view computed tomography (CT) -- using a small number of projections
for tomographic reconstruction -- enables much lower radiation dose to patients
and accelerated data acquisition. The reconstructed images, however, suffer
from strong artifacts, greatly limiting their diagnostic value. Current trends
for sparse-view CT turn to the raw data for better information recovery. The
resultant dual-domain methods, nonetheless, suffer from secondary artifacts,
especially in ultra-sparse view scenarios, and their generalization to other
scanners/protocols is greatly limited. A crucial question arises: have the
image post-processing methods reached the limit? Our answer is not yet. In this
paper, we stick to image post-processing methods due to great flexibility and
propose global representation (GloRe) distillation framework for sparse-view
CT, termed GloReDi. First, we propose to learn GloRe with Fourier convolution,
so each element in GloRe has an image-wide receptive field. Second, unlike
methods that only use the full-view images for supervision, we propose to
distill GloRe from intermediate-view reconstructed images that are readily
available but not explored in previous literature. The success of GloRe
distillation is attributed to two key components: representation directional
distillation to align the GloRe directions, and band-pass-specific contrastive
distillation to gain clinically important details. Extensive experiments
demonstrate the superiority of the proposed GloReDi over the state-of-the-art
methods, including dual-domain ones. The source code is available at
https://github.com/longzilicart/GloReDi.Comment: ICCV 202