1,732 research outputs found
Training deep neural networks with low precision multiplications
Multipliers are the most space and power-hungry arithmetic operators of the
digital implementation of deep neural networks. We train a set of
state-of-the-art neural networks (Maxout networks) on three benchmark datasets:
MNIST, CIFAR-10 and SVHN. They are trained with three distinct formats:
floating point, fixed point and dynamic fixed point. For each of those datasets
and for each of those formats, we assess the impact of the precision of the
multiplications on the final error after training. We find that very low
precision is sufficient not just for running trained networks but also for
training them. For example, it is possible to train Maxout networks with 10
bits multiplications.Comment: 10 pages, 5 figures, Accepted as a workshop contribution at ICLR 201
From Maxout to Channel-Out: Encoding Information on Sparse Pathways
Motivated by an important insight from neural science, we propose a new
framework for understanding the success of the recently proposed "maxout"
networks. The framework is based on encoding information on sparse pathways and
recognizing the correct pathway at inference time. Elaborating further on this
insight, we propose a novel deep network architecture, called "channel-out"
network, which takes a much better advantage of sparse pathway encoding. In
channel-out networks, pathways are not only formed a posteriori, but they are
also actively selected according to the inference outputs from the lower
layers. From a mathematical perspective, channel-out networks can represent a
wider class of piece-wise continuous functions, thereby endowing the network
with more expressive power than that of maxout networks. We test our
channel-out networks on several well-known image classification benchmarks,
setting new state-of-the-art performance on CIFAR-100 and STL-10, which
represent some of the "harder" image classification benchmarks.Comment: 10 pages including the appendix, 9 figure
On the Importance of Normalisation Layers in Deep Learning with Piecewise Linear Activation Units
Deep feedforward neural networks with piecewise linear activations are
currently producing the state-of-the-art results in several public datasets.
The combination of deep learning models and piecewise linear activation
functions allows for the estimation of exponentially complex functions with the
use of a large number of subnetworks specialized in the classification of
similar input examples. During the training process, these subnetworks avoid
overfitting with an implicit regularization scheme based on the fact that they
must share their parameters with other subnetworks. Using this framework, we
have made an empirical observation that can improve even more the performance
of such models. We notice that these models assume a balanced initial
distribution of data points with respect to the domain of the piecewise linear
activation function. If that assumption is violated, then the piecewise linear
activation units can degenerate into purely linear activation units, which can
result in a significant reduction of their capacity to learn complex functions.
Furthermore, as the number of model layers increases, this unbalanced initial
distribution makes the model ill-conditioned. Therefore, we propose the
introduction of batch normalisation units into deep feedforward neural networks
with piecewise linear activations, which drives a more balanced use of these
activation units, where each region of the activation function is trained with
a relatively large proportion of training samples. Also, this batch
normalisation promotes the pre-conditioning of very deep learning models. We
show that by introducing maxout and batch normalisation units to the network in
network model results in a model that produces classification results that are
better than or comparable to the current state of the art in CIFAR-10,
CIFAR-100, MNIST, and SVHN datasets
Learned-Norm Pooling for Deep Feedforward and Recurrent Neural Networks
In this paper we propose and investigate a novel nonlinear unit, called
unit, for deep neural networks. The proposed unit receives signals from
several projections of a subset of units in the layer below and computes a
normalized norm. We notice two interesting interpretations of the
unit. First, the proposed unit can be understood as a generalization of a
number of conventional pooling operators such as average, root-mean-square and
max pooling widely used in, for instance, convolutional neural networks (CNN),
HMAX models and neocognitrons. Furthermore, the unit is, to a certain
degree, similar to the recently proposed maxout unit (Goodfellow et al., 2013)
which achieved the state-of-the-art object recognition results on a number of
benchmark datasets. Secondly, we provide a geometrical interpretation of the
activation function based on which we argue that the unit is more
efficient at representing complex, nonlinear separating boundaries. Each
unit defines a superelliptic boundary, with its exact shape defined by the
order . We claim that this makes it possible to model arbitrarily shaped,
curved boundaries more efficiently by combining a few units of different
orders. This insight justifies the need for learning different orders for each
unit in the model. We empirically evaluate the proposed units on a number
of datasets and show that multilayer perceptrons (MLP) consisting of the
units achieve the state-of-the-art results on a number of benchmark datasets.
Furthermore, we evaluate the proposed unit on the recently proposed deep
recurrent neural networks (RNN).Comment: ECML/PKDD 201
- …