319 research outputs found
EraseReLU: A Simple Way to Ease the Training of Deep Convolution Neural Networks
For most state-of-the-art architectures, Rectified Linear Unit (ReLU) becomes
a standard component accompanied with each layer. Although ReLU can ease the
network training to an extent, the character of blocking negative values may
suppress the propagation of useful information and leads to the difficulty of
optimizing very deep Convolutional Neural Networks (CNNs). Moreover, stacking
layers with nonlinear activations is hard to approximate the intrinsic linear
transformations between feature representations.
In this paper, we investigate the effect of erasing ReLUs of certain layers
and apply it to various representative architectures following deterministic
rules. It can ease the optimization and improve the generalization performance
for very deep CNN models. We find two key factors being essential to the
performance improvement: 1) the location where ReLU should be erased inside the
basic module; 2) the proportion of basic modules to erase ReLU; We show that
erasing the last ReLU layer of all basic modules in a network usually yields
improved performance. In experiments, our approach successfully improves the
performance of various representative architectures, and we report the improved
results on SVHN, CIFAR-10/100, and ImageNet. Moreover, we achieve competitive
single-model performance on CIFAR-100 with 16.53% error rate compared to
state-of-the-art
MBS: Macroblock Scaling for CNN Model Reduction
In this paper we propose the macroblock scaling (MBS) algorithm, which can be
applied to various CNN architectures to reduce their model size. MBS adaptively
reduces each CNN macroblock depending on its information redundancy measured by
our proposed effective flops. Empirical studies conducted with ImageNet and
CIFAR-10 attest that MBS can reduce the model size of some already compact CNN
models, e.g., MobileNetV2 (25.03% further reduction) and ShuffleNet (20.74%),
and even ultra-deep ones such as ResNet-101 (51.67%) and ResNet-1202 (72.71%)
with negligible accuracy degradation. MBS also performs better reduction at a
much lower cost than the state-of-the-art optimization-based methods do. MBS's
simplicity and efficiency, its flexibility to work with any CNN model, and its
scalability to work with models of any depth make it an attractive choice for
CNN model size reduction.Comment: 8 pages (Accepted by CVPR'19
Deep Predictive Coding Network with Local Recurrent Processing for Object Recognition
Inspired by "predictive coding" - a theory in neuroscience, we develop a
bi-directional and dynamic neural network with local recurrent processing,
namely predictive coding network (PCN). Unlike feedforward-only convolutional
neural networks, PCN includes both feedback connections, which carry top-down
predictions, and feedforward connections, which carry bottom-up errors of
prediction. Feedback and feedforward connections enable adjacent layers to
interact locally and recurrently to refine representations towards minimization
of layer-wise prediction errors. When unfolded over time, the recurrent
processing gives rise to an increasingly deeper hierarchy of non-linear
transformation, allowing a shallow network to dynamically extend itself into an
arbitrarily deep network. We train and test PCN for image classification with
SVHN, CIFAR and ImageNet datasets. Despite notably fewer layers and parameters,
PCN achieves competitive performance compared to classical and state-of-the-art
models. Further analysis shows that the internal representations in PCN
converge over time and yield increasingly better accuracy in object
recognition. Errors of top-down prediction also reveal visual saliency or
bottom-up attention.Comment: 12 pages, 3 figure
A Comprehensive guide to Bayesian Convolutional Neural Network with Variational Inference
Artificial Neural Networks are connectionist systems that perform a given
task by learning on examples without having prior knowledge about the task.
This is done by finding an optimal point estimate for the weights in every
node. Generally, the network using point estimates as weights perform well with
large datasets, but they fail to express uncertainty in regions with little or
no data, leading to overconfident decisions.
In this paper, Bayesian Convolutional Neural Network (BayesCNN) using
Variational Inference is proposed, that introduces probability distribution
over the weights. Furthermore, the proposed BayesCNN architecture is applied to
tasks like Image Classification, Image Super-Resolution and Generative
Adversarial Networks. The results are compared to point-estimates based
architectures on MNIST, CIFAR-10 and CIFAR-100 datasets for Image
CLassification task, on BSD300 dataset for Image Super Resolution task and on
CIFAR10 dataset again for Generative Adversarial Network task.
BayesCNN is based on Bayes by Backprop which derives a variational
approximation to the true posterior. We, therefore, introduce the idea of
applying two convolutional operations, one for the mean and one for the
variance. Our proposed method not only achieves performances equivalent to
frequentist inference in identical architectures but also incorporate a
measurement for uncertainties and regularisation. It further eliminates the use
of dropout in the model. Moreover, we predict how certain the model prediction
is based on the epistemic and aleatoric uncertainties and empirically show how
the uncertainty can decrease, allowing the decisions made by the network to
become more deterministic as the training accuracy increases. Finally, we
propose ways to prune the Bayesian architecture and to make it more
computational and time effective.Comment: arXiv admin note: text overlap with arXiv:1506.02158,
arXiv:1703.04977 by other author
C3AE: Exploring the Limits of Compact Model for Age Estimation
Age estimation is a classic learning problem in computer vision. Many larger
and deeper CNNs have been proposed with promising performance, such as AlexNet,
VggNet, GoogLeNet and ResNet. However, these models are not practical for the
embedded/mobile devices. Recently, MobileNets and ShuffleNets have been
proposed to reduce the number of parameters, yielding lightweight models.
However, their representation has been weakened because of the adoption of
depth-wise separable convolution. In this work, we investigate the limits of
compact model for small-scale image and propose an extremely Compact yet
efficient Cascade Context-based Age Estimation model(C3AE). This model
possesses only 1/9 and 1/2000 parameters compared with MobileNets/ShuffleNets
and VggNet, while achieves competitive performance. In particular, we re-define
age estimation problem by two-points representation, which is implemented by a
cascade model. Moreover, to fully utilize the facial context information,
multi-branch CNN network is proposed to aggregate multi-scale context.
Experiments are carried out on three age estimation datasets. The
state-of-the-art performance on compact model has been achieved with a
relatively large margin.Comment: accepted by cvpr201
First-order Adversarial Vulnerability of Neural Networks and Input Dimension
Over the past few years, neural networks were proven vulnerable to
adversarial images: targeted but imperceptible image perturbations lead to
drastically different predictions. We show that adversarial vulnerability
increases with the gradients of the training objective when viewed as a
function of the inputs. Surprisingly, vulnerability does not depend on network
topology: for many standard network architectures, we prove that at
initialization, the -norm of these gradients grows as the square root
of the input dimension, leaving the networks increasingly vulnerable with
growing image size. We empirically show that this dimension dependence persists
after either usual or robust training, but gets attenuated with higher
regularization.Comment: Paper previously called: "Adversarial Vulnerability of Neural
Networks Increases with Input Dimension". 9 pages main text and references,
11 pages appendix, 14 figure
Demand Forecasting from Spatiotemporal Data with Graph Networks and Temporal-Guided Embedding
Short-term demand forecasting models commonly combine convolutional and
recurrent layers to extract complex spatiotemporal patterns in data. Long-term
histories are also used to consider periodicity and seasonality patterns as
time series data. In this study, we propose an efficient architecture,
Temporal-Guided Network (TGNet), which utilizes graph networks and
temporal-guided embedding. Graph networks extract invariant features to
permutations of adjacent regions instead of convolutional layers.
Temporal-guided embedding explicitly learns temporal contexts from training
data and is substituted for the input of long-term histories from days/weeks
ago. TGNet learns an autoregressive model, conditioned on temporal contexts of
forecasting targets from temporal-guided embedding. Finally, our model achieves
competitive performances with other baselines on three spatiotemporal demand
dataset from real-world, but the number of trainable parameters is about 20
times smaller than a state-of-the-art baseline. We also show that
temporal-guided embedding learns temporal contexts as intended and TGNet has
robust forecasting performances even to atypical event situations.Comment: NeurIPS 2018 Workshop on Modeling and Decision-Making in the
Spatiotemporal Domai
Towards Optimal Structured CNN Pruning via Generative Adversarial Learning
Structured pruning of filters or neurons has received increased focus for
compressing convolutional neural networks. Most existing methods rely on
multi-stage optimizations in a layer-wise manner for iteratively pruning and
retraining which may not be optimal and may be computation intensive. Besides,
these methods are designed for pruning a specific structure, such as filter or
block structures without jointly pruning heterogeneous structures. In this
paper, we propose an effective structured pruning approach that jointly prunes
filters as well as other structures in an end-to-end manner. To accomplish
this, we first introduce a soft mask to scale the output of these structures by
defining a new objective function with sparsity regularization to align the
output of baseline and network with this mask. We then effectively solve the
optimization problem by generative adversarial learning (GAL), which learns a
sparse soft mask in a label-free and an end-to-end manner. By forcing more
scaling factors in the soft mask to zero, the fast iterative
shrinkage-thresholding algorithm (FISTA) can be leveraged to fast and reliably
remove the corresponding structures. Extensive experiments demonstrate the
effectiveness of GAL on different datasets, including MNIST, CIFAR-10 and
ImageNet ILSVRC 2012. For example, on ImageNet ILSVRC 2012, the pruned
ResNet-50 achieves 10.88\% Top-5 error and results in a factor of 3.7x speedup.
This significantly outperforms state-of-the-art methods.Comment: Proceedings of the IEEE Conference on Computer Vision and Pattern
Recognition (CVPR
CR-Fill: Generative Image Inpainting with Auxiliary Contexutal Reconstruction
Recent deep generative inpainting methods use attention layers to allow the
generator to explicitly borrow feature patches from the known region to
complete a missing region. Due to the lack of supervision signals for the
correspondence between missing regions and known regions, it may fail to find
proper reference features, which often leads to artifacts in the results. Also,
it computes pair-wise similarity across the entire feature map during inference
bringing a significant computational overhead. To address this issue, we
propose to teach such patch-borrowing behavior to an attention-free generator
by joint training of an auxiliary contextual reconstruction task, which
encourages the generated output to be plausible even when reconstructed by
surrounding regions. The auxiliary branch can be seen as a learnable loss
function, i.e. named as contextual reconstruction (CR) loss, where
query-reference feature similarity and reference-based reconstructor are
jointly optimized with the inpainting generator. The auxiliary branch (i.e. CR
loss) is required only during training, and only the inpainting generator is
required during the inference. Experimental results demonstrate that the
proposed inpainting model compares favourably against the state-of-the-art in
terms of quantitative and visual performance
Deep Learning Based Spatial User Mapping on Extra Large MIMO Arrays
In an extra-large scale MIMO (XL-MIMO) system, the antenna arrays have a
large physical size that goes beyond the dimensions in traditional MIMO
systems. Because of this large dimensionality, the optimization of an XL-MIMO
system leads to solutions with prohibitive complexity when relying on
conventional optimization tools. In this paper, we propose a design based on
machine learning for the downlink of a multi-user setting with linear
pre-processing, where the goal is to select a limited mapping area per user,
i.e. a small portion of the array that contains the beamforming energy to the
user. We refer to this selection as spatial user mapping (SUM). Our solution
relies on learning using deep convolutional neural networks with a distributed
architecture that is built to manage the large system dimension. This
architecture contains one network per user where all the networks work in
parallel and exploit specific non-stationary properties of the channels along
the array. Our results show that, once the parallel networks are trained, they
provide the optimal SUM solution in more than of the instances,
resulting in a negligible sum-rate loss compared to a system using the optimal
SUM solution while providing an insightful approach to rethink these kinds of
problems that have no closed-form solution
- …