Search CORE

1,677 research outputs found

Dense xUnit Networks

Author: Kligvasser Idan
Michaeli Tomer
Publication venue
Publication date: 27/11/2018
Field of study

Deep net architectures have constantly evolved over the past few years, leading to significant advancements in a wide array of computer vision tasks. However, besides high accuracy, many applications also require a low computational load and limited memory footprint. To date, efficiency has typically been achieved either by architectural choices at the macro level (e.g. using skip connections or pruning techniques) or modifications at the level of the individual layers (e.g. using depth-wise convolutions or channel shuffle operations). Interestingly, much less attention has been devoted to the role of the activation functions in constructing efficient nets. Recently, Kligvasser et al. showed that incorporating spatial connections within the activation functions, enables a significant boost in performance in image restoration tasks, at any given budget of parameters. However, the effectiveness of their xUnit module has only been tested on simple small models, which are not characteristic of those used in high-level vision tasks. In this paper, we adopt and improve the xUnit activation, show how it can be incorporated into the DenseNet architecture, and illustrate its high effectiveness for classification and image restoration tasks alike. While the DenseNet architecture is extremely efficient to begin with, our dense xUnit net (DxNet) can typically achieve the same performance with far fewer parameters. For example, on ImageNet, our DxNet outperforms a ReLU-based DenseNet having 30% more parameters and achieves state-of-the-art results for this budget of parameters. Furthermore, in denoising and super-resolution, DxNet significantly improves upon all existing lightweight solutions, including the xUnit-based nets of Kligvasser et al

arXiv.org e-Print Archive

NTIRE 2020 Challenge on Image and Video Deblurring

Author: Lee Kyoung Mu
Nah Seungjun
Son Sanghyun
Timofte Radu
Publication venue
Publication date: 09/05/2020
Field of study

Motion blur is one of the most common degradation artifacts in dynamic scene photography. This paper reviews the NTIRE 2020 Challenge on Image and Video Deblurring. In this challenge, we present the evaluation results from 3 competition tracks as well as the proposed solutions. Track 1 aims to develop single-image deblurring methods focusing on restoration quality. On Track 2, the image deblurring methods are executed on a mobile platform to find the balance of the running speed and the restoration accuracy. Track 3 targets developing video deblurring methods that exploit the temporal relation between input frames. In each competition, there were 163, 135, and 102 registered participants and in the final testing phase, 9, 4, and 7 teams competed. The winning methods demonstrate the state-ofthe-art performance on image and video deblurring tasks.Comment: To be published in CVPR 2020 Workshop (New Trends in Image Restoration and Enhancement

arXiv.org e-Print Archive

CARAFE: Content-Aware ReAssembly of FEatures

Author: Chen Kai
Lin Dahua
Liu Ziwei
Loy Chen Change
Wang Jiaqi
Xu Rui
Publication venue
Publication date: 29/10/2019
Field of study

Feature upsampling is a key operation in a number of modern convolutional network architectures, e.g. feature pyramids. Its design is critical for dense prediction tasks such as object detection and semantic/instance segmentation. In this work, we propose Content-Aware ReAssembly of FEatures (CARAFE), a universal, lightweight and highly effective operator to fulfill this goal. CARAFE has several appealing properties: (1) Large field of view. Unlike previous works (e.g. bilinear interpolation) that only exploit sub-pixel neighborhood, CARAFE can aggregate contextual information within a large receptive field. (2) Content-aware handling. Instead of using a fixed kernel for all samples (e.g. deconvolution), CARAFE enables instance-specific content-aware handling, which generates adaptive kernels on-the-fly. (3) Lightweight and fast to compute. CARAFE introduces little computational overhead and can be readily integrated into modern network architectures. We conduct comprehensive evaluations on standard benchmarks in object detection, instance/semantic segmentation and inpainting. CARAFE shows consistent and substantial gains across all the tasks (1.2%, 1.3%, 1.8%, 1.1db respectively) with negligible computational overhead. It has great potential to serve as a strong building block for future research. It has great potential to serve as a strong building block for future research. Code and models are available at https://github.com/open-mmlab/mmdetection.Comment: ICCV 2019 Camera Ready (Oral

arXiv.org e-Print Archive

NTIRE 2020 Challenge on Real Image Denoising: Dataset, Methods and Results

Author: Abdelhamed Abdelrahman
Afifi Mahmoud
Bai Dongwoon
Bao Long
Brown Michael S.
Cao Yanlong
Cao Yanpeng
Cao Yue
Chen Wendong
Cho Hwechul
Choi Han-Soo
Choi Jang-Hwan
Ding Errui
Fan Yanwen
Fan Yuchen
Gupta Rajat
Han Junyu
Hu Fengshuo
Huang Thomas
Jeong Jechang
Kang Myungjoo
Kansal Priya
Khassenov Azamat
Kim Jong Hyun
Kim Sujin
Kim Sungho
Kim Wonjin
Kim Youngjung
Kumar Vineet
Latkowski Tomasz
Lee Jaayeon
Lee Jungwon
Lei Chunxia
Li Baopu
Li Chenghua
Li Zhihao
Liu Bin
Liu Jingtuo
Liu Jiye
Liu Meng
Liu Shuai
Liu Wei
Lu Xiaomu
Lu Xiwen
Lu Yunhua
Lv Shuailin
Maggioni Matteo
Marras Ioannis
Michelini Pablo Navarrete
Możejko Marcin
Nan Nan
Nathan Sabari
Pan Zhihong
Park Bumjun
Rho Kyeongha
Shah Nisarg A.
Shin Changyeop
Slabaugh Gregory
Song Kyungmin
Szafraniuk Michał
Tanay Thomas
Tang Pengliang
Tang Siliang
Timofte Radu
Treszczotko Lukasz
Trojanowski Krzysztof
Wang Shuangquan
Wang Tingniao
Wen Changyuan
Wu Yanhong
Wu Yaqi
Xi Teng
Xu Shusong
Yan Qiong
Yan Youliang
Yang Jiangxin
Yang Zengli
Ye Zhangyu
Yu Songhyun
Yu Xiyu
Zhang Gang
Zhang Xiaoling
Zhang Yunchao
Zhang Zhilu
Zhao Yiyun
Zhao Yuzhi
Zhou Yuqian
Zhussip Magauiya
Zong Ziyao
Zuo Wangmeng
Publication venue
Publication date: 08/05/2020
Field of study

This paper reviews the NTIRE 2020 challenge on real image denoising with focus on the newly introduced dataset, the proposed methods and their results. The challenge is a new version of the previous NTIRE 2019 challenge on real image denoising that was based on the SIDD benchmark. This challenge is based on a newly collected validation and testing image datasets, and hence, named SIDD+. This challenge has two tracks for quantitatively evaluating image denoising performance in (1) the Bayer-pattern rawRGB and (2) the standard RGB (sRGB) color spaces. Each track ~250 registered participants. A total of 22 teams, proposing 24 methods, competed in the final phase of the challenge. The proposed methods by the participating teams represent the current state-of-the-art performance in image denoising targeting real noisy images. The newly collected SIDD+ datasets are publicly available at: https://bit.ly/siddplus_data

arXiv.org e-Print Archive

Single Image Super-Resolution via Residual Neuron Attention Networks

Author: Ai Wenjie
Cheng Shilei
Tu Xiaoguang
Xie Mei
Publication venue
Publication date: 21/05/2020
Field of study

Deep Convolutional Neural Networks (DCNNs) have achieved impressive performance in Single Image Super-Resolution (SISR). To further improve the performance, existing CNN-based methods generally focus on designing deeper architecture of the network. However, we argue blindly increasing network's depth is not the most sensible way. In this paper, we propose a novel end-to-end Residual Neuron Attention Networks (RNAN) for more efficient and effective SISR. Structurally, our RNAN is a sequential integration of the well-designed Global Context-enhanced Residual Groups (GCRGs), which extracts super-resolved features from coarse to fine. Our GCRG is designed with two novelties. Firstly, the Residual Neuron Attention (RNA) mechanism is proposed in each block of GCRG to reveal the relevance of neurons for better feature representation. Furthermore, the Global Context (GC) block is embedded into RNAN at the end of each GCRG for effectively modeling the global contextual information. Experiments results demonstrate that our RNAN achieves the comparable results with state-of-the-art methods in terms of both quantitative metrics and visual quality, however, with simplified network architecture.Comment: 6 pages, 4 figures, Accepted by IEEE ICIP 202

arXiv.org e-Print Archive

Spatial Group-wise Enhance: Improving Semantic Feature Learning in Convolutional Networks

Author: Hu Xiaolin
Li Xiang
Yang Jian
Publication venue
Publication date: 25/05/2019
Field of study

The Convolutional Neural Networks (CNNs) generate the feature representation of complex objects by collecting hierarchical and different parts of semantic sub-features. These sub-features can usually be distributed in grouped form in the feature vector of each layer, representing various semantic entities. However, the activation of these sub-features is often spatially affected by similar patterns and noisy backgrounds, resulting in erroneous localization and identification. We propose a Spatial Group-wise Enhance (SGE) module that can adjust the importance of each sub-feature by generating an attention factor for each spatial location in each semantic group, so that every individual group can autonomously enhance its learnt expression and suppress possible noise. The attention factors are only guided by the similarities between the global and local feature descriptors inside each group, thus the design of SGE module is extremely lightweight with \emph{almost no extra parameters and calculations}. Despite being trained with only category supervisions, the SGE component is extremely effective in highlighting multiple active areas with various high-order semantics (such as the dog's eyes, nose, etc.). When integrated with popular CNN backbones, SGE can significantly boost the performance of image recognition tasks. Specifically, based on ResNet50 backbones, SGE achieves 1.2\% Top-1 accuracy improvement on the ImageNet benchmark and 1.0

\sim

2.0\% AP gain on the COCO benchmark across a wide range of detectors (Faster/Mask/Cascade RCNN and RetinaNet). Codes and pretrained models are available at https://github.com/implus/PytorchInsight.Comment: Code available at: https://github.com/implus/PytorchInsigh

arXiv.org e-Print Archive

Adapting Image Super-Resolution State-of-the-arts and Learning Multi-model Ensemble for Video Super-Resolution

Author: Ding Yukang
He Dongliang
Li Chao
Liu Xiao
Wen Shilei
Publication venue
Publication date: 07/05/2019
Field of study

Recently, image super-resolution has been widely studied and achieved significant progress by leveraging the power of deep convolutional neural networks. However, there has been limited advancement in video super-resolution (VSR) due to the complex temporal patterns in videos. In this paper, we investigate how to adapt state-of-the-art methods of image super-resolution for video super-resolution. The proposed adapting method is straightforward. The information among successive frames is well exploited, while the overhead on the original image super-resolution method is negligible. Furthermore, we propose a learning-based method to ensemble the outputs from multiple super-resolution models. Our methods show superior performance and rank second in the NTIRE2019 Video Super-Resolution Challenge Track 1

arXiv.org e-Print Archive

Learning a Wavelet-like Auto-Encoder to Accelerate Deep Neural Networks

Author: Chen Tianshui
Lin Liang
Luo Xiaonan
Zhang Lei
Zuo Wangmeng
Publication venue
Publication date: 20/12/2017
Field of study

Accelerating deep neural networks (DNNs) has been attracting increasing attention as it can benefit a wide range of applications, e.g., enabling mobile systems with limited computing resources to own powerful visual recognition ability. A practical strategy to this goal usually relies on a two-stage process: operating on the trained DNNs (e.g., approximating the convolutional filters with tensor decomposition) and fine-tuning the amended network, leading to difficulty in balancing the trade-off between acceleration and maintaining recognition performance. In this work, aiming at a general and comprehensive way for neural network acceleration, we develop a Wavelet-like Auto-Encoder (WAE) that decomposes the original input image into two low-resolution channels (sub-images) and incorporate the WAE into the classification neural networks for joint training. The two decomposed channels, in particular, are encoded to carry the low-frequency information (e.g., image profiles) and high-frequency (e.g., image details or noises), respectively, and enable reconstructing the original input image through the decoding process. Then, we feed the low-frequency channel into a standard classification network such as VGG or ResNet and employ a very lightweight network to fuse with the high-frequency channel to obtain the classification result. Compared to existing DNN acceleration solutions, our framework has the following advantages: i) it is tolerant to any existing convolutional neural networks for classification without amending their structures; ii) the WAE provides an interpretable way to preserve the main components of the input image for classification.Comment: Accepted at AAAI 201

arXiv.org e-Print Archive

HR-NAS: Searching Efficient High-Resolution Neural Architectures with Lightweight Transformers

Author: Ding Mingyu
Jin Xiaojie
Lian Xiaochen
Lu Zhiwu
Luo Ping
Wang Peng
Yang Linjie
Publication venue
Publication date: 11/06/2021
Field of study

High-resolution representations (HR) are essential for dense prediction tasks such as segmentation, detection, and pose estimation. Learning HR representations is typically ignored in previous Neural Architecture Search (NAS) methods that focus on image classification. This work proposes a novel NAS method, called HR-NAS, which is able to find efficient and accurate networks for different tasks, by effectively encoding multiscale contextual information while maintaining high-resolution representations. In HR-NAS, we renovate the NAS search space as well as its searching strategy. To better encode multiscale image contexts in the search space of HR-NAS, we first carefully design a lightweight transformer, whose computational complexity can be dynamically changed with respect to different objective functions and computation budgets. To maintain high-resolution representations of the learned networks, HR-NAS adopts a multi-branch architecture that provides convolutional encoding of multiple feature resolutions, inspired by HRNet. Last, we proposed an efficient fine-grained search strategy to train HR-NAS, which effectively explores the search space, and finds optimal architectures given various tasks and computation resources. HR-NAS is capable of achieving state-of-the-art trade-offs between performance and FLOPs for three dense prediction tasks and an image classification task, given only small computational budgets. For example, HR-NAS surpasses SqueezeNAS that is specially designed for semantic segmentation while improving efficiency by 45.9%. Code is available at https://github.com/dingmyu/HR-NASComment: Accepted by CVPR 2021 (Oral

arXiv.org e-Print Archive

Zoom-In-to-Check: Boosting Video Interpolation via Instance-level Discrimination

Author: Chen Yibo
Kong Tao
Liu Hantian
Shi Jianbo
Yuan Liangzhe
Publication venue
Publication date: 27/04/2019
Field of study

We propose a light-weight video frame interpolation algorithm. Our key innovation is an instance-level supervision that allows information to be learned from the high-resolution version of similar objects. Our experiment shows that the proposed method can generate state-of-the-art results across different datasets, with fractional computation resources (time and memory) of competing methods. Given two image frames, a cascade network creates an intermediate frame with 1) a flow-warping module that computes coarse bi-directional optical flow and creates an interpolated image via flow-based warping, followed by 2) an image synthesis module to make fine-scale corrections. In the learning stage, object detection proposals are generated on the interpolated image.Lower resolution objects are zoomed into, and the learning algorithms using an adversarial loss trained on high-resolution objects to guide the system towards the instance-level refinement corrects details of object shape and boundaries.Comment: CVPR 2019 camera-ready, supplementary video: https://youtu.be/q-_wIRq26D

arXiv.org e-Print Archive