3,774 research outputs found
Mixed Hierarchy Network for Image Restoration
Image restoration is a long-standing low-level vision problem, e.g.,
deblurring and deraining. In the process of image restoration, it is necessary
to consider not only the spatial details and contextual information of
restoration to ensure the quality, but also the system complexity. Although
many methods have been able to guarantee the quality of image restoration, the
system complexity of the state-of-the-art (SOTA) methods is increasing as well.
Motivated by this, we present a mixed hierarchy network that can balance these
competing goals. Our main proposal is a mixed hierarchy architecture, that
progressively recovers contextual information and spatial details from degraded
images while we design intra-blocks to reduce system complexity. Specifically,
our model first learns the contextual information using encoder-decoder
architectures, and then combines them with high-resolution branches that
preserve spatial detail. In order to reduce the system complexity of this
architecture for convenient analysis and comparison, we replace or remove the
nonlinear activation function with multiplication and use a simple network
structure. In addition, we replace spatial convolution with global
self-attention for the middle block of encoder-decoder. The resulting tightly
interlinked hierarchy architecture, named as MHNet, delivers strong performance
gains on several image restoration tasks, including image deraining, and
deblurring
Learning Enriched Features for Real Image Restoration and Enhancement
With the goal of recovering high-quality image content from its degraded
version, image restoration enjoys numerous applications, such as in
surveillance, computational photography, medical imaging, and remote sensing.
Recently, convolutional neural networks (CNNs) have achieved dramatic
improvements over conventional approaches for image restoration task. Existing
CNN-based methods typically operate either on full-resolution or on
progressively low-resolution representations. In the former case, spatially
precise but contextually less robust results are achieved, while in the latter
case, semantically reliable but spatially less accurate outputs are generated.
In this paper, we present a novel architecture with the collective goals of
maintaining spatially-precise high-resolution representations through the
entire network and receiving strong contextual information from the
low-resolution representations. The core of our approach is a multi-scale
residual block containing several key elements: (a) parallel multi-resolution
convolution streams for extracting multi-scale features, (b) information
exchange across the multi-resolution streams, (c) spatial and channel attention
mechanisms for capturing contextual information, and (d) attention based
multi-scale feature aggregation. In a nutshell, our approach learns an enriched
set of features that combines contextual information from multiple scales,
while simultaneously preserving the high-resolution spatial details. Extensive
experiments on five real image benchmark datasets demonstrate that our method,
named as MIRNet, achieves state-of-the-art results for a variety of image
processing tasks, including image denoising, super-resolution, and image
enhancement. The source code and pre-trained models are available at
https://github.com/swz30/MIRNet.Comment: Accepted for publication at ECCV 202
Two Birds, One Stone: A Unified Framework for Joint Learning of Image and Video Style Transfers
Current arbitrary style transfer models are limited to either image or video
domains. In order to achieve satisfying image and video style transfers, two
different models are inevitably required with separate training processes on
image and video domains, respectively. In this paper, we show that this can be
precluded by introducing UniST, a Unified Style Transfer framework for both
images and videos. At the core of UniST is a domain interaction transformer
(DIT), which first explores context information within the specific domain and
then interacts contextualized domain information for joint learning. In
particular, DIT enables exploration of temporal information from videos for the
image style transfer task and meanwhile allows rich appearance texture from
images for video style transfer, thus leading to mutual benefits. Considering
heavy computation of traditional multi-head self-attention, we present a simple
yet effective axial multi-head self-attention (AMSA) for DIT, which improves
computational efficiency while maintains style transfer performance. To verify
the effectiveness of UniST, we conduct extensive experiments on both image and
video style transfer tasks and show that UniST performs favorably against
state-of-the-art approaches on both tasks. Code is available at
https://github.com/NevSNev/UniST.Comment: Conference on International Conference on Computer Vision.(ICCV 2023
Non-locally Enhanced Encoder-Decoder Network for Single Image De-raining
Single image rain streaks removal has recently witnessed substantial progress
due to the development of deep convolutional neural networks. However, existing
deep learning based methods either focus on the entrance and exit of the
network by decomposing the input image into high and low frequency information
and employing residual learning to reduce the mapping range, or focus on the
introduction of cascaded learning scheme to decompose the task of rain streaks
removal into multi-stages. These methods treat the convolutional neural network
as an encapsulated end-to-end mapping module without deepening into the
rationality and superiority of neural network design. In this paper, we delve
into an effective end-to-end neural network structure for stronger feature
expression and spatial correlation learning. Specifically, we propose a
non-locally enhanced encoder-decoder network framework, which consists of a
pooling indices embedded encoder-decoder network to efficiently learn
increasingly abstract feature representation for more accurate rain streaks
modeling while perfectly preserving the image detail. The proposed
encoder-decoder framework is composed of a series of non-locally enhanced dense
blocks that are designed to not only fully exploit hierarchical features from
all the convolutional layers but also well capture the long-distance
dependencies and structural information. Extensive experiments on synthetic and
real datasets demonstrate that the proposed method can effectively remove
rain-streaks on rainy image of various densities while well preserving the
image details, which achieves significant improvements over the recent
state-of-the-art methods.Comment: Accepted to ACM Multimedia 201
- …