692 research outputs found
On the Importance of Normalisation Layers in Deep Learning with Piecewise Linear Activation Units
Deep feedforward neural networks with piecewise linear activations are
currently producing the state-of-the-art results in several public datasets.
The combination of deep learning models and piecewise linear activation
functions allows for the estimation of exponentially complex functions with the
use of a large number of subnetworks specialized in the classification of
similar input examples. During the training process, these subnetworks avoid
overfitting with an implicit regularization scheme based on the fact that they
must share their parameters with other subnetworks. Using this framework, we
have made an empirical observation that can improve even more the performance
of such models. We notice that these models assume a balanced initial
distribution of data points with respect to the domain of the piecewise linear
activation function. If that assumption is violated, then the piecewise linear
activation units can degenerate into purely linear activation units, which can
result in a significant reduction of their capacity to learn complex functions.
Furthermore, as the number of model layers increases, this unbalanced initial
distribution makes the model ill-conditioned. Therefore, we propose the
introduction of batch normalisation units into deep feedforward neural networks
with piecewise linear activations, which drives a more balanced use of these
activation units, where each region of the activation function is trained with
a relatively large proportion of training samples. Also, this batch
normalisation promotes the pre-conditioning of very deep learning models. We
show that by introducing maxout and batch normalisation units to the network in
network model results in a model that produces classification results that are
better than or comparable to the current state of the art in CIFAR-10,
CIFAR-100, MNIST, and SVHN datasets
Adaptive Normalized Risk-Averting Training For Deep Neural Networks
This paper proposes a set of new error criteria and learning approaches,
Adaptive Normalized Risk-Averting Training (ANRAT), to attack the non-convex
optimization problem in training deep neural networks (DNNs). Theoretically, we
demonstrate its effectiveness on global and local convexity lower-bounded by
the standard -norm error. By analyzing the gradient on the convexity index
, we explain the reason why to learn adaptively using
gradient descent works. In practice, we show how this method improves training
of deep neural networks to solve visual recognition tasks on the MNIST and
CIFAR-10 datasets. Without using pretraining or other tricks, we obtain results
comparable or superior to those reported in recent literature on the same tasks
using standard ConvNets + MSE/cross entropy. Performance on deep/shallow
multilayer perceptrons and Denoised Auto-encoders is also explored. ANRAT can
be combined with other quasi-Newton training methods, innovative network
variants, regularization techniques and other specific tricks in DNNs. Other
than unsupervised pretraining, it provides a new perspective to address the
non-convex optimization problem in DNNs.Comment: AAAI 2016, 0.39%~0.4% ER on MNIST with single 32-32-256-10 ConvNets,
code available at https://github.com/cauchyturing/ANRA
Hyper-parameter optimization of Deep Convolutional Networks for object recognition
Recently sequential model based optimization (SMBO) has emerged as a
promising hyper-parameter optimization strategy in machine learning. In this
work, we investigate SMBO to identify architecture hyper-parameters of deep
convolution networks (DCNs) object recognition. We propose a simple SMBO
strategy that starts from a set of random initial DCN architectures to generate
new architectures, which on training perform well on a given dataset. Using the
proposed SMBO strategy we are able to identify a number of DCN architectures
that produce results that are comparable to state-of-the-art results on object
recognition benchmarks.Comment: 4 pages, 1 figure, 3 tables, Submitted to ICIP 201
- …