402 research outputs found
Scale-wise Convolution for Image Restoration
While scale-invariant modeling has substantially boosted the performance of
visual recognition tasks, it remains largely under-explored in deep networks
based image restoration. Naively applying those scale-invariant techniques
(e.g. multi-scale testing, random-scale data augmentation) to image restoration
tasks usually leads to inferior performance. In this paper, we show that
properly modeling scale-invariance into neural networks can bring significant
benefits to image restoration performance. Inspired from spatial-wise
convolution for shift-invariance, "scale-wise convolution" is proposed to
convolve across multiple scales for scale-invariance. In our scale-wise
convolutional network (SCN), we first map the input image to the feature space
and then build a feature pyramid representation via bi-linear down-scaling
progressively. The feature pyramid is then passed to a residual network with
scale-wise convolutions. The proposed scale-wise convolution learns to
dynamically activate and aggregate features from different input scales in each
residual building block, in order to exploit contextual information on multiple
scales. In experiments, we compare the restoration accuracy and parameter
efficiency among our model and many different variants of multi-scale neural
networks. The proposed network with scale-wise convolution achieves superior
performance in multiple image restoration tasks including image
super-resolution, image denoising and image compression artifacts removal. Code
and models are available at: https://github.com/ychfan/scn_srComment: AAAI 202
One-Two-One Network for Compression Artifacts Reduction in Remote Sensing
Compression artifacts reduction (CAR) is a challenging problem in the field of remote sensing. Most recent deep learning based methods have demonstrated superior performance over the previous hand-crafted methods. In this paper, we propose an end-to-end one-two-one (OTO) network, to combine different deep models, i.e., summation and difference models, to solve the CAR problem. Particularly, the difference model motivated by the Laplacian pyramid is designed to obtain the high frequency information, while the summation model aggregates the low frequency information. We provide an in-depth investigation into our OTO architecture based on the Taylor expansion, which shows that these two kinds of information can be fused in a nonlinear scheme to gain more capacity of handling complicated image compression artifacts, especially the blocking effect in compression. Extensive experiments are conducted to demonstrate the superior performance of the OTO networks, as compared to the state-of-the-arts on remote sensing datasets and other benchmark datasets. The source code will be available here: https://github.com/bczhangbczhang/
Improving Dynamic HDR Imaging with Fusion Transformer
Reconstructing a High Dynamic Range (HDR) image from several Low Dynamic Range (LDR) images with different exposures is a challenging task, especially in the presence of camera and object motion. Though existing models using convolutional neural networks (CNNs) have made great progress, challenges still exist, e.g., ghosting artifacts. Transformers, originating from the field of natural language processing, have shown success in computer vision tasks, due to their ability to address a large receptive field even within a single layer. In this paper, we propose a transformer model for HDR imaging. Our pipeline includes three steps: alignment, fusion, and reconstruction. The key component is the HDR transformer module. Through experiments and ablation studies, we demonstrate that our model outperforms the state-of-the-art by large margins on several popular public datasets
- …