4,647 research outputs found
Sequentially Aggregated Convolutional Networks
Modern deep networks generally implement a certain form of shortcut
connections to alleviate optimization difficulties. However, we observe that
such network topology alters the nature of deep networks. In many ways, these
networks behave similarly to aggregated wide networks. We thus exploit the
aggregation nature of shortcut connections at a finer architectural level and
place them within wide convolutional layers. We end up with a sequentially
aggregated convolutional (SeqConv) layer that combines the benefits of both
wide and deep representations by aggregating features of various depths in
sequence. The proposed SeqConv serves as a drop-in replacement of regular wide
convolutional layers and thus could be handily integrated into any backbone
network. We apply SeqConv to widely adopted backbones including ResNet and
ResNeXt, and conduct experiments for image classification on public benchmark
datasets. Our ResNet based network with a model size of ResNet-50 easily
surpasses the performance of the 2.35 larger ResNet-152, while our
ResNeXt based model sets a new state-of-the-art accuracy on ImageNet
classification for networks with similar model complexity. The code and
pre-trained models of our work are publicly available at
https://github.com/GroupOfAlchemists/SeqConv.Comment: To appear in ICCV 2019 worksho
Controllable Top-down Feature Transformer
We study the intrinsic transformation of feature maps across convolutional
network layers with explicit top-down control. To this end, we develop top-down
feature transformer (TFT), under controllable parameters, that are able to
account for the hidden layer transformation while maintaining the overall
consistency across layers. The learned generators capture the underlying
feature transformation processes that are independent of particular training
images. Our proposed TFT framework brings insights to and helps the
understanding of, an important problem of studying the CNN internal feature
representation and transformation under the top-down processes. In the case of
spatial transformations, we demonstrate the significant advantage of TFT over
existing data-driven approaches in building data-independent transformations.
We also show that it can be adopted in other applications such as data
augmentation and image style transfer
Using accumulation to optimize deep residual neural nets
Residual Neural Networks [1] won first place in all five main tracks of the
ImageNet and COCO 2015 competitions. This kind of network involves the creation
of pluggable modules such that the output contains a residual from the input.
The residual in that paper is the identity function. We propose to include
residuals from all lower layers, suitably normalized, to create the residual.
This way, all previous layers contribute equally to the output of a layer. We
show that our approach is an improvement on [1] for the CIFAR-10 dataset.Comment: 7 pages, 6 figures, 1 tabl
A Survey of the Recent Architectures of Deep Convolutional Neural Networks
Deep Convolutional Neural Network (CNN) is a special type of Neural Networks,
which has shown exemplary performance on several competitions related to
Computer Vision and Image Processing. Some of the exciting application areas of
CNN include Image Classification and Segmentation, Object Detection, Video
Processing, Natural Language Processing, and Speech Recognition. The powerful
learning ability of deep CNN is primarily due to the use of multiple feature
extraction stages that can automatically learn representations from the data.
The availability of a large amount of data and improvement in the hardware
technology has accelerated the research in CNNs, and recently interesting deep
CNN architectures have been reported. Several inspiring ideas to bring
advancements in CNNs have been explored, such as the use of different
activation and loss functions, parameter optimization, regularization, and
architectural innovations. However, the significant improvement in the
representational capacity of the deep CNN is achieved through architectural
innovations. Notably, the ideas of exploiting spatial and channel information,
depth and width of architecture, and multi-path information processing have
gained substantial attention. Similarly, the idea of using a block of layers as
a structural unit is also gaining popularity. This survey thus focuses on the
intrinsic taxonomy present in the recently reported deep CNN architectures and,
consequently, classifies the recent innovations in CNN architectures into seven
different categories. These seven categories are based on spatial exploitation,
depth, multi-path, width, feature-map exploitation, channel boosting, and
attention. Additionally, the elementary understanding of CNN components,
current challenges, and applications of CNN are also provided.Comment: Number of Pages: 70, Number of Figures: 11, Number of Tables: 11.
Artif Intell Rev (2020
Sharing Residual Units Through Collective Tensor Factorization in Deep Neural Networks
Residual units are wildly used for alleviating optimization difficulties when
building deep neural networks. However, the performance gain does not well
compensate the model size increase, indicating low parameter efficiency in
these residual units. In this work, we first revisit the residual function in
several variations of residual units and demonstrate that these residual
functions can actually be explained with a unified framework based on
generalized block term decomposition. Then, based on the new explanation, we
propose a new architecture, Collective Residual Unit (CRU), which enhances the
parameter efficiency of deep neural networks through collective tensor
factorization. CRU enables knowledge sharing across different residual units
using shared factors. Experimental results show that our proposed CRU Network
demonstrates outstanding parameter efficiency, achieving comparable
classification performance to ResNet-200 with the model size of ResNet-50. By
building a deeper network using CRU, we can achieve state-of-the-art single
model classification accuracy on ImageNet-1k and Places365-Standard benchmark
datasets. (Code and trained models are available on GitHub
Deep Part Induction from Articulated Object Pairs
Object functionality is often expressed through part articulation -- as when
the two rigid parts of a scissor pivot against each other to perform the
cutting function. Such articulations are often similar across objects within
the same functional category. In this paper, we explore how the observation of
different articulation states provides evidence for part structure and motion
of 3D objects. Our method takes as input a pair of unsegmented shapes
representing two different articulation states of two functionally related
objects, and induces their common parts along with their underlying rigid
motion. This is a challenging setting, as we assume no prior shape structure,
no prior shape category information, no consistent shape orientation, the
articulation states may belong to objects of different geometry, plus we allow
inputs to be noisy and partial scans, or point clouds lifted from RGB images.
Our method learns a neural network architecture with three modules that
respectively propose correspondences, estimate 3D deformation flows, and
perform segmentation. To achieve optimal performance, our architecture
alternates between correspondence, deformation flow, and segmentation
prediction iteratively in an ICP-like fashion. Our results demonstrate that our
method significantly outperforms state-of-the-art techniques in the task of
discovering articulated parts of objects. In addition, our part induction is
object-class agnostic and successfully generalizes to new and unseen objects
Local Relation Networks for Image Recognition
The convolution layer has been the dominant feature extractor in computer
vision for years. However, the spatial aggregation in convolution is basically
a pattern matching process that applies fixed filters which are inefficient at
modeling visual elements with varying spatial distributions. This paper
presents a new image feature extractor, called the local relation layer, that
adaptively determines aggregation weights based on the compositional
relationship of local pixel pairs. With this relational approach, it can
composite visual elements into higher-level entities in a more efficient manner
that benefits semantic inference. A network built with local relation layers,
called the Local Relation Network (LR-Net), is found to provide greater
modeling capacity than its counterpart built with regular convolution on
large-scale recognition tasks such as ImageNet classification
Smooth Inter-layer Propagation of Stabilized Neural Networks for Classification
Recent work has studied the reasons for the remarkable performance of deep
neural networks in image classification. We examine batch normalization on the
one hand and the dynamical systems view of residual networks on the other hand.
Our goal is in understanding the notions of stability and smoothness of the
inter-layer propagation of ResNets so as to explain when they contribute to
significantly enhanced performance. We postulate that such stability is of
importance for the trained ResNet to transfer.Comment: Revised Abstrac
PatchShuffle Regularization
This paper focuses on regularizing the training of the convolutional neural
network (CNN). We propose a new regularization approach named ``PatchShuffle``
that can be adopted in any classification-oriented CNN models. It is easy to
implement: in each mini-batch, images or feature maps are randomly chosen to
undergo a transformation such that pixels within each local patch are shuffled.
Through generating images and feature maps with interior orderless patches,
PatchShuffle creates rich local variations, reduces the risk of network
overfitting, and can be viewed as a beneficial supplement to various kinds of
training regularization techniques, such as weight decay, model ensemble and
dropout. Experiments on four representative classification datasets show that
PatchShuffle improves the generalization ability of CNN especially when the
data is scarce. Moreover, we empirically illustrate that CNN models trained
with PatchShuffle are more robust to noise and local changes in an image
Dynamic Routing Networks
The deployment of deep neural networks in real-world applications is mostly
restricted by their high inference costs. Extensive efforts have been made to
improve the accuracy with expert-designed or algorithm-searched architectures.
However, the incremental improvement is typically achieved with increasingly
more expensive models that only a small portion of input instances really need.
Inference with a static architecture that processes all input instances via the
same transformation would thus incur unnecessary computational costs.
Therefore, customizing the model capacity in an instance-aware manner is much
needed for higher inference efficiency. In this paper, we propose Dynamic
Routing Networks (DRNets), which support efficient instance-aware inference by
routing the input instance to only necessary transformation branches selected
from a candidate set of branches for each connection between transformation
nodes. The branch selection is dynamically determined via the corresponding
branch importance weights, which are first generated from lightweight
hypernetworks (RouterNets) and then recalibrated with Gumbel-Softmax before the
selection. Extensive experiments show that DRNets can reduce a substantial
amount of parameter size and FLOPs during inference with prediction performance
comparable to state-of-the-art architectures.Comment: 10 pages, 3 figures, 3 table
- …