150 research outputs found
Transforming Image Super-Resolution: A ConvFormer-based Efficient Approach
Recent progress in single-image super-resolution (SISR) has achieved
remarkable performance, yet the computational costs of these methods remain a
challenge for deployment on resource-constrained devices. Especially for
transformer-based methods, the self-attention mechanism in such models brings
great breakthroughs while incurring substantial computational costs. To tackle
this issue, we introduce the Convolutional Transformer layer (ConvFormer) and
the ConvFormer-based Super-Resolution network (CFSR), which offer an effective
and efficient solution for lightweight image super-resolution tasks. In detail,
CFSR leverages the large kernel convolution as the feature mixer to replace the
self-attention module, efficiently modeling long-range dependencies and
extensive receptive fields with a slight computational cost. Furthermore, we
propose an edge-preserving feed-forward network, simplified as EFN, to obtain
local feature aggregation and simultaneously preserve more high-frequency
information. Extensive experiments demonstrate that CFSR can achieve an
advanced trade-off between computational cost and performance when compared to
existing lightweight SR methods. Compared to state-of-the-art methods, e.g.
ShuffleMixer, the proposed CFSR achieves 0.39 dB gains on Urban100 dataset for
x2 SR task while containing 26% and 31% fewer parameters and FLOPs,
respectively. Code and pre-trained models are available at
https://github.com/Aitical/CFSR.Comment: submitting to TI
Fully Convolutional Network for Lightweight Image Super-Resolution
Deep models have achieved significant process on single image
super-resolution (SISR) tasks, in particular large models with large kernel
( or more). However, the heavy computational footprint of such models
prevents their deployment in real-time, resource-constrained environments.
Conversely, convolutions bring substantial computational efficiency,
but struggle with aggregating local spatial representations, an essential
capability to SISR models. In response to this dichotomy, we propose to
harmonize the merits of both and kernels, and exploit a
great potential for lightweight SISR tasks. Specifically, we propose a simple
yet effective fully convolutional network, named Shift-Conv-based
Network (SCNet). By incorporating a parameter-free spatial-shift operation, it
equips the fully convolutional network with powerful representation
capability while impressive computational efficiency. Extensive experiments
demonstrate that SCNets, despite its fully convolutional structure,
consistently matches or even surpasses the performance of existing lightweight
SR models that employ regular convolutions. The code and pre-trained models can
be found at https://github.com/Aitical/SCNet.Comment: Accepted by Machine Intelligence Research, DOI:
10.1007/s11633-024-1401-
Image Deblurring by Exploring In-depth Properties of Transformer
Image deblurring continues to achieve impressive performance with the
development of generative models. Nonetheless, there still remains a
displeasing problem if one wants to improve perceptual quality and quantitative
scores of recovered image at the same time. In this study, drawing inspiration
from the research of transformer properties, we introduce the pretrained
transformers to address this problem. In particular, we leverage deep features
extracted from a pretrained vision transformer (ViT) to encourage recovered
images to be sharp without sacrificing the performance measured by the
quantitative metrics. The pretrained transformer can capture the global
topological relations (i.e., self-similarity) of image, and we observe that the
captured topological relations about the sharp image will change when blur
occurs. By comparing the transformer features between recovered image and
target one, the pretrained transformer provides high-resolution blur-sensitive
semantic information, which is critical in measuring the sharpness of the
deblurred image. On the basis of the advantages, we present two types of novel
perceptual losses to guide image deblurring. One regards the features as
vectors and computes the discrepancy between representations extracted from
recovered image and target one in Euclidean space. The other type considers the
features extracted from an image as a distribution and compares the
distribution discrepancy between recovered image and target one. We demonstrate
the effectiveness of transformer properties in improving the perceptual quality
while not sacrificing the quantitative scores (PSNR) over the most competitive
models, such as Uformer, Restormer, and NAFNet, on defocus deblurring and
motion deblurring tasks
- …