1,891 research outputs found
Training Very Deep Networks via Residual Learning with Stochastic Input Shortcut Connections
Many works have posited the benefit of depth in deep networks. However,
one of the problems encountered in the training of very deep networks is feature
reuse; that is, features are ’diluted’ as they are forward propagated through
the model. Hence, later network layers receive less informative signals about the
input data, consequently making training less effective. In this work, we address
the problem of feature reuse by taking inspiration from an earlier work which
employed residual learning for alleviating the problem of feature reuse. We propose
a modification of residual learning for training very deep networks to realize
improved generalization performance; for this, we allow stochastic shortcut connections
of identity mappings from the input to hidden layers.We perform extensive
experiments using the USPS and MNIST datasets. On the USPS dataset, we
achieve an error rate of 2.69% without employing any form of data augmentation
(or manipulation). On the MNIST dataset, we reach a comparable state-of-the-art
error rate of 0.52%. Particularly, these results are achieved without employing
any explicit regularization technique
Deep Pyramidal Residual Networks
Deep convolutional neural networks (DCNNs) have shown remarkable performance
in image classification tasks in recent years. Generally, deep neural network
architectures are stacks consisting of a large number of convolutional layers,
and they perform downsampling along the spatial dimension via pooling to reduce
memory usage. Concurrently, the feature map dimension (i.e., the number of
channels) is sharply increased at downsampling locations, which is essential to
ensure effective performance because it increases the diversity of high-level
attributes. This also applies to residual networks and is very closely related
to their performance. In this research, instead of sharply increasing the
feature map dimension at units that perform downsampling, we gradually increase
the feature map dimension at all units to involve as many locations as
possible. This design, which is discussed in depth together with our new
insights, has proven to be an effective means of improving generalization
ability. Furthermore, we propose a novel residual unit capable of further
improving the classification accuracy with our new network architecture.
Experiments on benchmark CIFAR-10, CIFAR-100, and ImageNet datasets have shown
that our network architecture has superior generalization ability compared to
the original residual networks. Code is available at
https://github.com/jhkim89/PyramidNet}Comment: Accepted to CVPR 201
Differentiable Sparsification for Deep Neural Networks
Deep neural networks have relieved a great deal of burden on human experts in
relation to feature engineering. However, comparable efforts are instead
required to determine effective architectures. In addition, as the sizes of
networks have grown overly large, a considerable amount of resources is also
invested in reducing the sizes. The sparsification of an over-complete model
addresses these problems as it removes redundant components and connections. In
this study, we propose a fully differentiable sparsification method for deep
neural networks which allows parameters to be zero during training via
stochastic gradient descent. Thus, the proposed method can learn the sparsified
structure and weights of a network in an end-to-end manner. The method is
directly applicable to various modern deep neural networks and imposes minimum
modification to existing models. To the best of our knowledge, this is the
first fully [sub-]differentiable sparsification method that zeroes out
parameters. It provides a foundation for future structure learning and model
compression methods
Dual Skipping Networks
Inspired by the recent neuroscience studies on the left-right asymmetry of
the human brain in processing low and high spatial frequency information, this
paper introduces a dual skipping network which carries out coarse-to-fine
object categorization. Such a network has two branches to simultaneously deal
with both coarse and fine-grained classification tasks. Specifically, we
propose a layer-skipping mechanism that learns a gating network to predict
which layers to skip in the testing stage. This layer-skipping mechanism endows
the network with good flexibility and capability in practice. Evaluations are
conducted on several widely used coarse-to-fine object categorization
benchmarks, and promising results are achieved by our proposed network model.Comment: CVPR 2018 (poster); fix typ
- …