Search CORE

150 research outputs found

Transforming Image Super-Resolution: A ConvFormer-based Efficient Approach

Author: Jiang Junjun
Jiang Junpeng
Liu Xianming
Wu Gang
Publication venue
Publication date: 10/01/2024
Field of study

Recent progress in single-image super-resolution (SISR) has achieved remarkable performance, yet the computational costs of these methods remain a challenge for deployment on resource-constrained devices. Especially for transformer-based methods, the self-attention mechanism in such models brings great breakthroughs while incurring substantial computational costs. To tackle this issue, we introduce the Convolutional Transformer layer (ConvFormer) and the ConvFormer-based Super-Resolution network (CFSR), which offer an effective and efficient solution for lightweight image super-resolution tasks. In detail, CFSR leverages the large kernel convolution as the feature mixer to replace the self-attention module, efficiently modeling long-range dependencies and extensive receptive fields with a slight computational cost. Furthermore, we propose an edge-preserving feed-forward network, simplified as EFN, to obtain local feature aggregation and simultaneously preserve more high-frequency information. Extensive experiments demonstrate that CFSR can achieve an advanced trade-off between computational cost and performance when compared to existing lightweight SR methods. Compared to state-of-the-art methods, e.g. ShuffleMixer, the proposed CFSR achieves 0.39 dB gains on Urban100 dataset for x2 SR task while containing 26% and 31% fewer parameters and FLOPs, respectively. Code and pre-trained models are available at https://github.com/Aitical/CFSR.Comment: submitting to TI

arXiv.org e-Print Archive

Fully $1\times1$ Convolutional Network for Lightweight Image Super-Resolution

Author: Jiang Junjun
Jiang Kui
Liu Xianming
Wu Gang
Publication venue
Publication date: 12/03/2024
Field of study

Deep models have achieved significant process on single image super-resolution (SISR) tasks, in particular large models with large kernel (

3\times3

or more). However, the heavy computational footprint of such models prevents their deployment in real-time, resource-constrained environments. Conversely,

1\times1

convolutions bring substantial computational efficiency, but struggle with aggregating local spatial representations, an essential capability to SISR models. In response to this dichotomy, we propose to harmonize the merits of both

3\times3

and

1\times1

kernels, and exploit a great potential for lightweight SISR tasks. Specifically, we propose a simple yet effective fully

1\times1

convolutional network, named Shift-Conv-based Network (SCNet). By incorporating a parameter-free spatial-shift operation, it equips the fully

1\times1

convolutional network with powerful representation capability while impressive computational efficiency. Extensive experiments demonstrate that SCNets, despite its fully

1\times1

convolutional structure, consistently matches or even surpasses the performance of existing lightweight SR models that employ regular convolutions. The code and pre-trained models can be found at https://github.com/Aitical/SCNet.Comment: Accepted by Machine Intelligence Research, DOI: 10.1007/s11633-024-1401-

arXiv.org e-Print Archive

Image Deblurring by Exploring In-depth Properties of Transformer

Author: Jiang Junjun
Liang Pengwei
Liu Xianming
Ma Jiayi
Publication venue
Publication date: 24/03/2023
Field of study

Image deblurring continues to achieve impressive performance with the development of generative models. Nonetheless, there still remains a displeasing problem if one wants to improve perceptual quality and quantitative scores of recovered image at the same time. In this study, drawing inspiration from the research of transformer properties, we introduce the pretrained transformers to address this problem. In particular, we leverage deep features extracted from a pretrained vision transformer (ViT) to encourage recovered images to be sharp without sacrificing the performance measured by the quantitative metrics. The pretrained transformer can capture the global topological relations (i.e., self-similarity) of image, and we observe that the captured topological relations about the sharp image will change when blur occurs. By comparing the transformer features between recovered image and target one, the pretrained transformer provides high-resolution blur-sensitive semantic information, which is critical in measuring the sharpness of the deblurred image. On the basis of the advantages, we present two types of novel perceptual losses to guide image deblurring. One regards the features as vectors and computes the discrepancy between representations extracted from recovered image and target one in Euclidean space. The other type considers the features extracted from an image as a distribution and compares the distribution discrepancy between recovered image and target one. We demonstrate the effectiveness of transformer properties in improving the perceptual quality while not sacrificing the quantitative scores (PSNR) over the most competitive models, such as Uformer, Restormer, and NAFNet, on defocus deblurring and motion deblurring tasks

arXiv.org e-Print Archive