38 research outputs found
ASF-Net: Robust Video Deraining via Temporal Alignment and Online Adaptive Learning
In recent times, learning-based methods for video deraining have demonstrated
commendable results. However, there are two critical challenges that these
methods are yet to address: exploiting temporal correlations among adjacent
frames and ensuring adaptability to unknown real-world scenarios. To overcome
these challenges, we explore video deraining from a paradigm design perspective
to learning strategy construction. Specifically, we propose a new computational
paradigm, Alignment-Shift-Fusion Network (ASF-Net), which incorporates a
temporal shift module. This module is novel to this field and provides deeper
exploration of temporal information by facilitating the exchange of
channel-level information within the feature space. To fully discharge the
model's characterization capability, we further construct a LArge-scale RAiny
video dataset (LARA) which also supports the development of this community. On
the basis of the newly-constructed dataset, we explore the parameters learning
process by developing an innovative re-degraded learning strategy. This
strategy bridges the gap between synthetic and real-world scenes, resulting in
stronger scene adaptability. Our proposed approach exhibits superior
performance in three benchmarks and compelling visual quality in real-world
scenarios, underscoring its efficacy. The code is available at
https://github.com/vis-opt-group/ASF-Net
RCDNet: An Interpretable Rain Convolutional Dictionary Network for Single Image Deraining
As a common weather, rain streaks adversely degrade the image quality. Hence,
removing rains from an image has become an important issue in the field. To
handle such an ill-posed single image deraining task, in this paper, we
specifically build a novel deep architecture, called rain convolutional
dictionary network (RCDNet), which embeds the intrinsic priors of rain streaks
and has clear interpretability. In specific, we first establish a RCD model for
representing rain streaks and utilize the proximal gradient descent technique
to design an iterative algorithm only containing simple operators for solving
the model. By unfolding it, we then build the RCDNet in which every network
module has clear physical meanings and corresponds to each operation involved
in the algorithm. This good interpretability greatly facilitates an easy
visualization and analysis on what happens inside the network and why it works
well in inference process. Moreover, taking into account the domain gap issue
in real scenarios, we further design a novel dynamic RCDNet, where the rain
kernels can be dynamically inferred corresponding to input rainy images and
then help shrink the space for rain layer estimation with few rain maps so as
to ensure a fine generalization performance in the inconsistent scenarios of
rain types between training and testing data. By end-to-end training such an
interpretable network, all involved rain kernels and proximal operators can be
automatically extracted, faithfully characterizing the features of both rain
and clean background layers, and thus naturally lead to better deraining
performance. Comprehensive experiments substantiate the superiority of our
method, especially on its well generality to diverse testing scenarios and good
interpretability for all its modules. Code is available in
\emph{\url{https://github.com/hongwang01/DRCDNet}}
Video Adverse-Weather-Component Suppression Network via Weather Messenger and Adversarial Backpropagation
Although convolutional neural networks (CNNs) have been proposed to remove
adverse weather conditions in single images using a single set of pre-trained
weights, they fail to restore weather videos due to the absence of temporal
information. Furthermore, existing methods for removing adverse weather
conditions (e.g., rain, fog, and snow) from videos can only handle one type of
adverse weather. In this work, we propose the first framework for restoring
videos from all adverse weather conditions by developing a video
adverse-weather-component suppression network (ViWS-Net). To achieve this, we
first devise a weather-agnostic video transformer encoder with multiple
transformer stages. Moreover, we design a long short-term temporal modeling
mechanism for weather messenger to early fuse input adjacent video frames and
learn weather-specific information. We further introduce a weather
discriminator with gradient reversion, to maintain the weather-invariant common
information and suppress the weather-specific information in pixel features, by
adversarially predicting weather types. Finally, we develop a messenger-driven
video transformer decoder to retrieve the residual weather-specific feature,
which is spatiotemporally aggregated with hierarchical pixel features and
refined to predict the clean target frame of input videos. Experimental
results, on benchmark datasets and real-world weather videos, demonstrate that
our ViWS-Net outperforms current state-of-the-art methods in terms of restoring
videos degraded by any weather condition
Interpretability and Generalization of Deep Low-Level Vision Models
The low-level vision task is an important type of task in computer vision, including various image
restoration tasks, such as image super-resolution, image denoising, image deraining, etc. In recent
years, deep learning technology has become the de facto method for solving low-level vision
problems, relying on its excellent performance and ease of use. By training on large amounts of
paired data, it is anticipated that deep low-level vision models can learn rich semantic knowledge and
process images in an intelligent manner for real-world applications. However, because our
understanding of deep learning models and low-level vision tasks is not deep enough, we cannot
explain the success and failure of these deep low-level vision models. Deep learning models are
widely acknowledged as ``black boxes'' due to their complexity and non-linearity. We cannot know
what information the model used when processing the input or whether it learned what we wanted.
When there is a problem with the model, we cannot identify the underlying source of the problem,
such as the generalization problem of the low-level vision model. This research proposes
interpretability analysis of deep low-level vision models to gain a more profound insight into the deep
learning models for low-level vision tasks. I aim to elucidate the mechanisms of the deep learning
approach and to discern insights regarding the successes or shortcomings of these methods. This is
the first study to perform interpretability analysis on the deep low-level vision model