Search CORE

13,055 research outputs found

Rethinking the Smaller-Norm-Less-Informative Assumption in Channel Pruning of Convolution Layers

Author: Lin Zhe
Lu Xin
Wang James Z.
Ye Jianbo
Publication venue
Publication date: 02/02/2018
Field of study

Model pruning has become a useful technique that improves the computational efficiency of deep learning, making it possible to deploy solutions in resource-limited scenarios. A widely-used practice in relevant work assumes that a smaller-norm parameter or feature plays a less informative role at the inference time. In this paper, we propose a channel pruning technique for accelerating the computations of deep convolutional neural networks (CNNs) that does not critically rely on this assumption. Instead, it focuses on direct simplification of the channel-to-channel computation graph of a CNN without the need of performing a computationally difficult and not-always-useful task of making high-dimensional tensors of CNN structured sparse. Our approach takes two stages: first to adopt an end-to- end stochastic training method that eventually forces the outputs of some channels to be constant, and then to prune those constant channels from the original neural network by adjusting the biases of their impacting layers such that the resulting compact model can be quickly fine-tuned. Our approach is mathematically appealing from an optimization perspective and easy to reproduce. We experimented our approach through several image learning benchmarks and demonstrate its interesting aspects and competitive performance.Comment: accepted to ICLR 2018, 11 page

arXiv.org e-Print Archive

Channel Pruning via Optimal Thresholding

Author: Fwu Jong-Kae
Yang Qing
Ye Yun
You Ganmei
Zhu Xia
Zhu Yuan
Publication venue
Publication date: 10/09/2020
Field of study

Structured pruning, especially channel pruning is widely used for the reduced computational cost and the compatibility with off-the-shelf hardware devices. Among existing works, weights are typically removed using a predefined global threshold, or a threshold computed from a predefined metric. The predefined global threshold based designs ignore the variation among different layers and weights distribution, therefore, they may often result in sub-optimal performance caused by over-pruning or under-pruning. In this paper, we present a simple yet effective method, termed Optimal Thresholding (OT), to prune channels with layer dependent thresholds that optimally separate important from negligible channels. By using OT, most negligible or unimportant channels are pruned to achieve high sparsity while minimizing performance degradation. Since most important weights are preserved, the pruned model can be further fine-tuned and quickly converge with very few iterations. Our method demonstrates superior performance, especially when compared to the state-of-the-art designs at high levels of sparsity. On CIFAR-100, a pruned and fine-tuned DenseNet-121 by using OT achieves 75.99% accuracy with only 1.46e8 FLOPs and 0.71M parameters.Comment: ICONIP 202

arXiv.org e-Print Archive

AssembleNet: Searching for Multi-Stream Neural Connectivity in Video Architectures

Author: Angelova Anelia
Piergiovanni AJ
Ryoo Michael S.
Tan Mingxing
Publication venue
Publication date: 27/05/2020
Field of study

Learning to represent videos is a very challenging task both algorithmically and computationally. Standard video CNN architectures have been designed by directly extending architectures devised for image understanding to include the time dimension, using modules such as 3D convolutions, or by using two-stream design to capture both appearance and motion in videos. We interpret a video CNN as a collection of multi-stream convolutional blocks connected to each other, and propose the approach of automatically finding neural architectures with better connectivity and spatio-temporal interactions for video understanding. This is done by evolving a population of overly-connected architectures guided by connection weight learning. Architectures combining representations that abstract different input types (i.e., RGB and optical flow) at multiple temporal resolutions are searched for, allowing different types or sources of information to interact with each other. Our method, referred to as AssembleNet, outperforms prior approaches on public video datasets, in some cases by a great margin. We obtain 58.6% mAP on Charades and 34.27% accuracy on Moments-in-Time

arXiv.org e-Print Archive

Hierarchical Label Inference for Video Classification

Author: Mori Greg
Nauata Nelson
Smith Jonathan
Publication venue
Publication date: 21/01/2018
Field of study

Videos are a rich source of high-dimensional structured data, with a wide range of interacting components at varying levels of granularity. In order to improve understanding of unconstrained internet videos, it is important to consider the role of labels at separate levels of abstraction. In this paper, we consider the use of the Bidirectional Inference Neural Network (BINN) for performing graph-based inference in label space for the task of video classification. We take advantage of the inherent hierarchy between labels at increasing granularity. The BINN is evaluated on the first and second release of the YouTube-8M large scale multilabel video dataset. Our results demonstrate the effectiveness of BINN, achieving significant improvements against baseline models

arXiv.org e-Print Archive

Improving Landmark Recognition using Saliency detection and Feature classification

Author: Bhowmick Sagnik
Indu S.
Jayanthi N.
Kumar Akash
Publication venue
Publication date: 30/11/2018
Field of study

Image Landmark Recognition has been one of the most sought-after classification challenges in the field of vision and perception. After so many years of generic classification of buildings and monuments from images, people are now focussing upon fine-grained problems - recognizing the category of each building or monument. We proposed an ensemble network for the purpose of classification of Indian Landmark Images. To this end, our method gives robust classification by ensembling the predictions from Graph-Based Visual Saliency (GBVS) network alongwith supervised feature-based classification algorithms such as kNN and Random Forest. The final architecture is an adaptive learning of all the mentioned networks. The proposed network produces a reliable score to eliminate false category cases. Evaluation of our model was done on a new dataset, which involves challenges such as landmark clutter, variable scaling, partial occlusion, etc.Comment: Pre-print of the paper to be published in Springer, accepted in the proceedings of the in 2nd Workshop on Digital Heritage at the 11th Indian Conference on Computer Vision, Graphics and Image Processin

arXiv.org e-Print Archive

LatentGNN: Learning Efficient Non-local Relations for Visual Recognition

Author: He Xuming
Yan Shipeng
Zhang Songyang
Publication venue
Publication date: 28/05/2019
Field of study

Capturing long-range dependencies in feature representations is crucial for many visual recognition tasks. Despite recent successes of deep convolutional networks, it remains challenging to model non-local context relations between visual features. A promising strategy is to model the feature context by a fully-connected graph neural network (GNN), which augments traditional convolutional features with an estimated non-local context representation. However, most GNN-based approaches require computing a dense graph affinity matrix and hence have difficulty in scaling up to tackle complex real-world visual problems. In this work, we propose an efficient and yet flexible non-local relation representation based on a novel class of graph neural networks. Our key idea is to introduce a latent space to reduce the complexity of graph, which allows us to use a low-rank representation for the graph affinity matrix and to achieve a linear complexity in computation. Extensive experimental evaluations on three major visual recognition tasks show that our method outperforms the prior works with a large margin while maintaining a low computation cost.Comment: ICML 201

arXiv.org e-Print Archive

AutoSlim: Towards One-Shot Architecture Search for Channel Numbers

Author: Huang Thomas
Yu Jiahui
Publication venue
Publication date: 31/05/2019
Field of study

We study how to set channel numbers in a neural network to achieve better accuracy under constrained resources (e.g., FLOPs, latency, memory footprint or model size). A simple and one-shot solution, named AutoSlim, is presented. Instead of training many network samples and searching with reinforcement learning, we train a single slimmable network to approximate the network accuracy of different channel configurations. We then iteratively evaluate the trained slimmable model and greedily slim the layer with minimal accuracy drop. By this single pass, we can obtain the optimized channel configurations under different resource constraints. We present experiments with MobileNet v1, MobileNet v2, ResNet-50 and RL-searched MNasNet on ImageNet classification. We show significant improvements over their default channel configurations. We also achieve better accuracy than recent channel pruning methods and neural architecture search methods. Notably, by setting optimized channel numbers, our AutoSlim-MobileNet-v2 at 305M FLOPs achieves 74.2% top-1 accuracy, 2.4% better than default MobileNet-v2 (301M FLOPs), and even 0.2% better than RL-searched MNasNet (317M FLOPs). Our AutoSlim-ResNet-50 at 570M FLOPs, without depthwise convolutions, achieves 1.3% better accuracy than MobileNet-v1 (569M FLOPs). Code and models will be available at: https://github.com/JiahuiYu/slimmable_networksComment: tech repor

arXiv.org e-Print Archive

Elastic Neural Networks for Classification

Author: Bai Yue
Bhattacharyya Shuvra S.
Huttunen Heikki
Zhou Yi
Publication venue
Publication date: 30/05/2019
Field of study

In this work we propose a framework for improving the performance of any deep neural network that may suffer from vanishing gradients. To address the vanishing gradient issue, we study a framework, where we insert an intermediate output branch after each layer in the computational graph and use the corresponding prediction loss for feeding the gradient to the early layers. The framework - which we name Elastic network - is tested with several well-known networks on CIFAR10 and CIFAR100 datasets, and the experimental results show that the proposed framework improves the accuracy on both shallow networks (e.g., MobileNet) and deep convolutional neural networks (e.g., DenseNet). We also identify the types of networks where the framework does not improve the performance and discuss the reasons. Finally, as a side product, the computational complexity of the resulting networks can be adjusted in an elastic manner by selecting the output branch according to current computational budget.Comment: 2019 IEEE International Conference on Artificial Intelligence Circuits and System

arXiv.org e-Print Archive

Visually Impaired Aid using Convolutional Neural Networks, Transfer Learning, and Particle Competition and Cooperation

Author: Breve Fabricio
Fischer Carlos Norberto
Publication venue
Publication date: 09/05/2020
Field of study

Navigation and mobility are some of the major problems faced by visually impaired people in their daily lives. Advances in computer vision led to the proposal of some navigation systems. However, most of them require expensive and/or heavy hardware. In this paper we propose the use of convolutional neural networks (CNN), transfer learning, and semi-supervised learning (SSL) to build a framework aimed at the visually impaired aid. It has low computational costs and, therefore, may be implemented on current smartphones, without relying on any additional equipment. The smartphone camera can be used to automatically take pictures of the path ahead. Then, they will be immediately classified, providing almost instantaneous feedback to the user. We also propose a dataset to train the classifiers, including indoor and outdoor situations with different types of light, floor, and obstacles. Many different CNN architectures are evaluated as feature extractors and classifiers, by fine-tuning weights pre-trained on a much larger dataset. The graph-based SSL method, known as particle competition and cooperation, is also used for classification, allowing feedback from the user to be incorporated without retraining the underlying network. 92\% and 80\% classification accuracy is achieved in the proposed dataset in the best supervised and SSL scenarios, respectively.Comment: BREVE, Fabricio Aparecido; FISCHER, Carlos Norberto. Visually Impaired Aid using Convolutional Neural Networks, Transfer Learning, and Particle Competition and Cooperation In: 2020 International Joint Conference on Neural Networks (IJCNN 2020), 2020, Glasgow, UK. Proceedings of 2020 International Joint Conference on Neural Networks (IJCNN 2020), 2020. (accepted for publication

arXiv.org e-Print Archive

Exploring Randomly Wired Neural Networks for Image Recognition

Author: Girshick Ross
He Kaiming
Kirillov Alexander
Xie Saining
Publication venue
Publication date: 08/04/2019
Field of study

Neural networks for image recognition have evolved through extensive manual design from simple chain-like models to structures with multiple wiring paths. The success of ResNets and DenseNets is due in large part to their innovative wiring plans. Now, neural architecture search (NAS) studies are exploring the joint optimization of wiring and operation types, however, the space of possible wirings is constrained and still driven by manual design despite being searched. In this paper, we explore a more diverse set of connectivity patterns through the lens of randomly wired neural networks. To do this, we first define the concept of a stochastic network generator that encapsulates the entire network generation process. Encapsulation provides a unified view of NAS and randomly wired networks. Then, we use three classical random graph models to generate randomly wired graphs for networks. The results are surprising: several variants of these random generators yield network instances that have competitive accuracy on the ImageNet benchmark. These results suggest that new efforts focusing on designing better network generators may lead to new breakthroughs by exploring less constrained search spaces with more room for novel design.Comment: Technical repor

arXiv.org e-Print Archive