1,423 research outputs found
Convolutional neural network based on sparse graph attention mechanism for MRI super-resolution
Magnetic resonance imaging (MRI) is a valuable clinical tool for displaying
anatomical structures and aiding in accurate diagnosis. Medical image
super-resolution (SR) reconstruction using deep learning techniques can enhance
lesion analysis and assist doctors in improving diagnostic efficiency and
accuracy. However, existing deep learning-based SR methods predominantly rely
on convolutional neural networks (CNNs), which inherently limit the expressive
capabilities of these models and therefore make it challenging to discover
potential relationships between different image features. To overcome this
limitation, we propose an A-network that utilizes multiple convolution operator
feature extraction modules (MCO) for extracting image features using multiple
convolution operators. These extracted features are passed through multiple
sets of cross-feature extraction modules (MSC) to highlight key features
through inter-channel feature interactions, enabling subsequent feature
learning. An attention-based sparse graph neural network module is incorporated
to establish relationships between pixel features, learning which adjacent
pixels have the greatest impact on determining the features to be filled. To
evaluate our model's effectiveness, we conducted experiments using different
models on data generated from multiple datasets with different degradation
multiples, and the experimental results show that our method is a significant
improvement over the current state-of-the-art methods.Comment: 12 pages, 6 figure
DARTS: Double Attention Reference-based Transformer for Super-resolution
We present DARTS, a transformer model for reference-based image
super-resolution. DARTS learns joint representations of two image distributions
to enhance the content of low-resolution input images through matching
correspondences learned from high-resolution reference images. Current
state-of-the-art techniques in reference-based image super-resolution are based
on a multi-network, multi-stage architecture. In this work, we adapt the double
attention block from the GAN literature, processing the two visual streams
separately and combining self-attention and cross-attention blocks through a
gating attention strategy. Our work demonstrates how the attention mechanism
can be adapted for the particular requirements of reference-based image
super-resolution, significantly simplifying the architecture and training
pipeline. We show that our transformer-based model performs competitively with
state-of-the-art models, while maintaining a simpler overall architecture and
training process. In particular, we obtain state-of-the-art on the SUN80
dataset, with a PSNR/SSIM of 29.83 / .809. These results show that attention
alone is sufficient for the RSR task, without multiple purpose-built
subnetworks, knowledge distillation, or multi-stage training
Cross-View Hierarchy Network for Stereo Image Super-Resolution
Stereo image super-resolution aims to improve the quality of high-resolution
stereo image pairs by exploiting complementary information across views. To
attain superior performance, many methods have prioritized designing complex
modules to fuse similar information across views, yet overlooking the
importance of intra-view information for high-resolution reconstruction. It
also leads to problems of wrong texture in recovered images. To address this
issue, we explore the interdependencies between various hierarchies from
intra-view and propose a novel method, named Cross-View-Hierarchy Network for
Stereo Image Super-Resolution (CVHSSR). Specifically, we design a
cross-hierarchy information mining block (CHIMB) that leverages channel
attention and large kernel convolution attention to extract both global and
local features from the intra-view, enabling the efficient restoration of
accurate texture details. Additionally, a cross-view interaction module (CVIM)
is proposed to fuse similar features from different views by utilizing
cross-view attention mechanisms, effectively adapting to the binocular scene.
Extensive experiments demonstrate the effectiveness of our method. CVHSSR
achieves the best stereo image super-resolution performance than other
state-of-the-art methods while using fewer parameters. The source code and
pre-trained models are available at https://github.com/AlexZou14/CVHSSR.Comment: 10 pages, 7 figures, CVPRW, NTIRE202
Recursive Generalization Transformer for Image Super-Resolution
Transformer architectures have exhibited remarkable performance in image
super-resolution (SR). Since the quadratic computational complexity of the
self-attention (SA) in Transformer, existing methods tend to adopt SA in a
local region to reduce overheads. However, the local design restricts the
global context exploitation, which is crucial for accurate image
reconstruction. In this work, we propose the Recursive Generalization
Transformer (RGT) for image SR, which can capture global spatial information
and is suitable for high-resolution images. Specifically, we propose the
recursive-generalization self-attention (RG-SA). It recursively aggregates
input features into representative feature maps, and then utilizes
cross-attention to extract global information. Meanwhile, the channel
dimensions of attention matrices (query, key, and value) are further scaled to
mitigate the redundancy in the channel domain. Furthermore, we combine the
RG-SA with local self-attention to enhance the exploitation of the global
context, and propose the hybrid adaptive integration (HAI) for module
integration. The HAI allows the direct and effective fusion between features at
different levels (local or global). Extensive experiments demonstrate that our
RGT outperforms recent state-of-the-art methods quantitatively and
qualitatively. Code is released at https://github.com/zhengchen1999/RGT.Comment: Code is available at https://github.com/zhengchen1999/RG
Cross Aggregation Transformer for Image Restoration
Recently, Transformer architecture has been introduced into image restoration
to replace convolution neural network (CNN) with surprising results.
Considering the high computational complexity of Transformer with global
attention, some methods use the local square window to limit the scope of
self-attention. However, these methods lack direct interaction among different
windows, which limits the establishment of long-range dependencies. To address
the above issue, we propose a new image restoration model, Cross Aggregation
Transformer (CAT). The core of our CAT is the Rectangle-Window Self-Attention
(Rwin-SA), which utilizes horizontal and vertical rectangle window attention in
different heads parallelly to expand the attention area and aggregate the
features cross different windows. We also introduce the Axial-Shift operation
for different window interactions. Furthermore, we propose the Locality
Complementary Module to complement the self-attention mechanism, which
incorporates the inductive bias of CNN (e.g., translation invariance and
locality) into Transformer, enabling global-local coupling. Extensive
experiments demonstrate that our CAT outperforms recent state-of-the-art
methods on several image restoration applications. The code and models are
available at https://github.com/zhengchen1999/CAT.Comment: Accepted to NeurIPS 2022. Code is available at
https://github.com/zhengchen1999/CA
Cascaded Local Implicit Transformer for Arbitrary-Scale Super-Resolution
Implicit neural representation has recently shown a promising ability in
representing images with arbitrary resolutions. In this paper, we present a
Local Implicit Transformer (LIT), which integrates the attention mechanism and
frequency encoding technique into a local implicit image function. We design a
cross-scale local attention block to effectively aggregate local features. To
further improve representative power, we propose a Cascaded LIT (CLIT) that
exploits multi-scale features, along with a cumulative training strategy that
gradually increases the upsampling scales during training. We have conducted
extensive experiments to validate the effectiveness of these components and
analyze various training strategies. The qualitative and quantitative results
demonstrate that LIT and CLIT achieve favorable results and outperform the
prior works in arbitrary super-resolution tasks
SimpSON: Simplifying Photo Cleanup with Single-Click Distracting Object Segmentation Network
In photo editing, it is common practice to remove visual distractions to
improve the overall image quality and highlight the primary subject. However,
manually selecting and removing these small and dense distracting regions can
be a laborious and time-consuming task. In this paper, we propose an
interactive distractor selection method that is optimized to achieve the task
with just a single click. Our method surpasses the precision and recall
achieved by the traditional method of running panoptic segmentation and then
selecting the segments containing the clicks. We also showcase how a
transformer-based module can be used to identify more distracting regions
similar to the user's click position. Our experiments demonstrate that the
model can effectively and accurately segment unknown distracting objects
interactively and in groups. By significantly simplifying the photo cleaning
and retouching process, our proposed model provides inspiration for exploring
rare object segmentation and group selection with a single click.Comment: CVPR 2023. Project link: https://simpson-cvpr23.github.i
UltraSR: Spatial Encoding is a Missing Key for Implicit Image Function-based Arbitrary-Scale Super-Resolution
The recent success of NeRF and other related implicit neural representation
methods has opened a new path for continuous image representation, where pixel
values no longer need to be looked up from stored discrete 2D arrays but can be
inferred from neural network models on a continuous spatial domain. Although
the recent work LIIF has demonstrated that such novel approach can achieve good
performance on the arbitrary-scale super-resolution task, their upscaled images
frequently show structural distortion due to the faulty prediction on
high-frequency textures. In this work, we propose UltraSR, a simple yet
effective new network design based on implicit image functions in which spatial
coordinates and periodic encoding are deeply integrated with the implicit
neural representation. We show that spatial encoding is indeed a missing key
towards the next-stage high-accuracy implicit image function through extensive
experiments and ablation studies. Our UltraSR sets new state-of-the-art
performance on the DIV2K benchmark under all super-resolution scales comparing
to previous state-of-the-art methods. UltraSR also achieves superior
performance on other standard benchmark datasets in which it outperforms prior
works in almost all experiments. Our code will be released at
https://github.com/SHI-Labs/UltraSR-Arbitrary-Scale-Super-Resolution
Data Upcycling Knowledge Distillation for Image Super-Resolution
Knowledge distillation (KD) emerges as a challenging yet promising technique
for compressing deep learning models, characterized by the transmission of
extensive learning representations from proficient and computationally
intensive teacher models to compact student models. However, only a handful of
studies have endeavored to compress the models for single image
super-resolution (SISR) through KD, with their effects on student model
enhancement remaining marginal. In this paper, we put forth an approach from
the perspective of efficient data utilization, namely, the Data Upcycling
Knowledge Distillation (DUKD) which facilitates the student model by the prior
knowledge teacher provided via upcycled in-domain data derived from their
inputs. This upcycling process is realized through two efficient image zooming
operations and invertible data augmentations which introduce the label
consistency regularization to the field of KD for SISR and substantially boosts
student model's generalization. The DUKD, due to its versatility, can be
applied across a broad spectrum of teacher-student architectures. Comprehensive
experiments across diverse benchmarks demonstrate that our proposed DUKD method
significantly outperforms previous art, exemplified by an increase of up to
0.5dB in PSNR over baselines methods, and a 67% parameters reduced RCAN model's
performance remaining on par with that of the RCAN teacher model
- …