13,055 research outputs found
Rethinking the Smaller-Norm-Less-Informative Assumption in Channel Pruning of Convolution Layers
Model pruning has become a useful technique that improves the computational
efficiency of deep learning, making it possible to deploy solutions in
resource-limited scenarios. A widely-used practice in relevant work assumes
that a smaller-norm parameter or feature plays a less informative role at the
inference time. In this paper, we propose a channel pruning technique for
accelerating the computations of deep convolutional neural networks (CNNs) that
does not critically rely on this assumption. Instead, it focuses on direct
simplification of the channel-to-channel computation graph of a CNN without the
need of performing a computationally difficult and not-always-useful task of
making high-dimensional tensors of CNN structured sparse. Our approach takes
two stages: first to adopt an end-to- end stochastic training method that
eventually forces the outputs of some channels to be constant, and then to
prune those constant channels from the original neural network by adjusting the
biases of their impacting layers such that the resulting compact model can be
quickly fine-tuned. Our approach is mathematically appealing from an
optimization perspective and easy to reproduce. We experimented our approach
through several image learning benchmarks and demonstrate its interesting
aspects and competitive performance.Comment: accepted to ICLR 2018, 11 page
Channel Pruning via Optimal Thresholding
Structured pruning, especially channel pruning is widely used for the reduced
computational cost and the compatibility with off-the-shelf hardware devices.
Among existing works, weights are typically removed using a predefined global
threshold, or a threshold computed from a predefined metric. The predefined
global threshold based designs ignore the variation among different layers and
weights distribution, therefore, they may often result in sub-optimal
performance caused by over-pruning or under-pruning. In this paper, we present
a simple yet effective method, termed Optimal Thresholding (OT), to prune
channels with layer dependent thresholds that optimally separate important from
negligible channels. By using OT, most negligible or unimportant channels are
pruned to achieve high sparsity while minimizing performance degradation. Since
most important weights are preserved, the pruned model can be further
fine-tuned and quickly converge with very few iterations. Our method
demonstrates superior performance, especially when compared to the
state-of-the-art designs at high levels of sparsity. On CIFAR-100, a pruned and
fine-tuned DenseNet-121 by using OT achieves 75.99% accuracy with only 1.46e8
FLOPs and 0.71M parameters.Comment: ICONIP 202
AssembleNet: Searching for Multi-Stream Neural Connectivity in Video Architectures
Learning to represent videos is a very challenging task both algorithmically
and computationally. Standard video CNN architectures have been designed by
directly extending architectures devised for image understanding to include the
time dimension, using modules such as 3D convolutions, or by using two-stream
design to capture both appearance and motion in videos. We interpret a video
CNN as a collection of multi-stream convolutional blocks connected to each
other, and propose the approach of automatically finding neural architectures
with better connectivity and spatio-temporal interactions for video
understanding. This is done by evolving a population of overly-connected
architectures guided by connection weight learning. Architectures combining
representations that abstract different input types (i.e., RGB and optical
flow) at multiple temporal resolutions are searched for, allowing different
types or sources of information to interact with each other. Our method,
referred to as AssembleNet, outperforms prior approaches on public video
datasets, in some cases by a great margin. We obtain 58.6% mAP on Charades and
34.27% accuracy on Moments-in-Time
Hierarchical Label Inference for Video Classification
Videos are a rich source of high-dimensional structured data, with a wide
range of interacting components at varying levels of granularity. In order to
improve understanding of unconstrained internet videos, it is important to
consider the role of labels at separate levels of abstraction. In this paper,
we consider the use of the Bidirectional Inference Neural Network (BINN) for
performing graph-based inference in label space for the task of video
classification. We take advantage of the inherent hierarchy between labels at
increasing granularity. The BINN is evaluated on the first and second release
of the YouTube-8M large scale multilabel video dataset. Our results demonstrate
the effectiveness of BINN, achieving significant improvements against baseline
models
Improving Landmark Recognition using Saliency detection and Feature classification
Image Landmark Recognition has been one of the most sought-after
classification challenges in the field of vision and perception. After so many
years of generic classification of buildings and monuments from images, people
are now focussing upon fine-grained problems - recognizing the category of each
building or monument. We proposed an ensemble network for the purpose of
classification of Indian Landmark Images. To this end, our method gives robust
classification by ensembling the predictions from Graph-Based Visual Saliency
(GBVS) network alongwith supervised feature-based classification algorithms
such as kNN and Random Forest. The final architecture is an adaptive learning
of all the mentioned networks. The proposed network produces a reliable score
to eliminate false category cases. Evaluation of our model was done on a new
dataset, which involves challenges such as landmark clutter, variable scaling,
partial occlusion, etc.Comment: Pre-print of the paper to be published in Springer, accepted in the
proceedings of the in 2nd Workshop on Digital Heritage at the 11th Indian
Conference on Computer Vision, Graphics and Image Processin
LatentGNN: Learning Efficient Non-local Relations for Visual Recognition
Capturing long-range dependencies in feature representations is crucial for
many visual recognition tasks. Despite recent successes of deep convolutional
networks, it remains challenging to model non-local context relations between
visual features. A promising strategy is to model the feature context by a
fully-connected graph neural network (GNN), which augments traditional
convolutional features with an estimated non-local context representation.
However, most GNN-based approaches require computing a dense graph affinity
matrix and hence have difficulty in scaling up to tackle complex real-world
visual problems. In this work, we propose an efficient and yet flexible
non-local relation representation based on a novel class of graph neural
networks. Our key idea is to introduce a latent space to reduce the complexity
of graph, which allows us to use a low-rank representation for the graph
affinity matrix and to achieve a linear complexity in computation. Extensive
experimental evaluations on three major visual recognition tasks show that our
method outperforms the prior works with a large margin while maintaining a low
computation cost.Comment: ICML 201
AutoSlim: Towards One-Shot Architecture Search for Channel Numbers
We study how to set channel numbers in a neural network to achieve better
accuracy under constrained resources (e.g., FLOPs, latency, memory footprint or
model size). A simple and one-shot solution, named AutoSlim, is presented.
Instead of training many network samples and searching with reinforcement
learning, we train a single slimmable network to approximate the network
accuracy of different channel configurations. We then iteratively evaluate the
trained slimmable model and greedily slim the layer with minimal accuracy drop.
By this single pass, we can obtain the optimized channel configurations under
different resource constraints. We present experiments with MobileNet v1,
MobileNet v2, ResNet-50 and RL-searched MNasNet on ImageNet classification. We
show significant improvements over their default channel configurations. We
also achieve better accuracy than recent channel pruning methods and neural
architecture search methods.
Notably, by setting optimized channel numbers, our AutoSlim-MobileNet-v2 at
305M FLOPs achieves 74.2% top-1 accuracy, 2.4% better than default MobileNet-v2
(301M FLOPs), and even 0.2% better than RL-searched MNasNet (317M FLOPs). Our
AutoSlim-ResNet-50 at 570M FLOPs, without depthwise convolutions, achieves 1.3%
better accuracy than MobileNet-v1 (569M FLOPs). Code and models will be
available at: https://github.com/JiahuiYu/slimmable_networksComment: tech repor
Elastic Neural Networks for Classification
In this work we propose a framework for improving the performance of any deep
neural network that may suffer from vanishing gradients. To address the
vanishing gradient issue, we study a framework, where we insert an intermediate
output branch after each layer in the computational graph and use the
corresponding prediction loss for feeding the gradient to the early layers. The
framework - which we name Elastic network - is tested with several well-known
networks on CIFAR10 and CIFAR100 datasets, and the experimental results show
that the proposed framework improves the accuracy on both shallow networks
(e.g., MobileNet) and deep convolutional neural networks (e.g., DenseNet). We
also identify the types of networks where the framework does not improve the
performance and discuss the reasons. Finally, as a side product, the
computational complexity of the resulting networks can be adjusted in an
elastic manner by selecting the output branch according to current
computational budget.Comment: 2019 IEEE International Conference on Artificial Intelligence
Circuits and System
Visually Impaired Aid using Convolutional Neural Networks, Transfer Learning, and Particle Competition and Cooperation
Navigation and mobility are some of the major problems faced by visually
impaired people in their daily lives. Advances in computer vision led to the
proposal of some navigation systems. However, most of them require expensive
and/or heavy hardware. In this paper we propose the use of convolutional neural
networks (CNN), transfer learning, and semi-supervised learning (SSL) to build
a framework aimed at the visually impaired aid. It has low computational costs
and, therefore, may be implemented on current smartphones, without relying on
any additional equipment. The smartphone camera can be used to automatically
take pictures of the path ahead. Then, they will be immediately classified,
providing almost instantaneous feedback to the user. We also propose a dataset
to train the classifiers, including indoor and outdoor situations with
different types of light, floor, and obstacles. Many different CNN
architectures are evaluated as feature extractors and classifiers, by
fine-tuning weights pre-trained on a much larger dataset. The graph-based SSL
method, known as particle competition and cooperation, is also used for
classification, allowing feedback from the user to be incorporated without
retraining the underlying network. 92\% and 80\% classification accuracy is
achieved in the proposed dataset in the best supervised and SSL scenarios,
respectively.Comment: BREVE, Fabricio Aparecido; FISCHER, Carlos Norberto. Visually
Impaired Aid using Convolutional Neural Networks, Transfer Learning, and
Particle Competition and Cooperation In: 2020 International Joint Conference
on Neural Networks (IJCNN 2020), 2020, Glasgow, UK. Proceedings of 2020
International Joint Conference on Neural Networks (IJCNN 2020), 2020.
(accepted for publication
Exploring Randomly Wired Neural Networks for Image Recognition
Neural networks for image recognition have evolved through extensive manual
design from simple chain-like models to structures with multiple wiring paths.
The success of ResNets and DenseNets is due in large part to their innovative
wiring plans. Now, neural architecture search (NAS) studies are exploring the
joint optimization of wiring and operation types, however, the space of
possible wirings is constrained and still driven by manual design despite being
searched. In this paper, we explore a more diverse set of connectivity patterns
through the lens of randomly wired neural networks. To do this, we first define
the concept of a stochastic network generator that encapsulates the entire
network generation process. Encapsulation provides a unified view of NAS and
randomly wired networks. Then, we use three classical random graph models to
generate randomly wired graphs for networks. The results are surprising:
several variants of these random generators yield network instances that have
competitive accuracy on the ImageNet benchmark. These results suggest that new
efforts focusing on designing better network generators may lead to new
breakthroughs by exploring less constrained search spaces with more room for
novel design.Comment: Technical repor
- …