1,732 research outputs found

    Training deep neural networks with low precision multiplications

    Full text link
    Multipliers are the most space and power-hungry arithmetic operators of the digital implementation of deep neural networks. We train a set of state-of-the-art neural networks (Maxout networks) on three benchmark datasets: MNIST, CIFAR-10 and SVHN. They are trained with three distinct formats: floating point, fixed point and dynamic fixed point. For each of those datasets and for each of those formats, we assess the impact of the precision of the multiplications on the final error after training. We find that very low precision is sufficient not just for running trained networks but also for training them. For example, it is possible to train Maxout networks with 10 bits multiplications.Comment: 10 pages, 5 figures, Accepted as a workshop contribution at ICLR 201

    From Maxout to Channel-Out: Encoding Information on Sparse Pathways

    Full text link
    Motivated by an important insight from neural science, we propose a new framework for understanding the success of the recently proposed "maxout" networks. The framework is based on encoding information on sparse pathways and recognizing the correct pathway at inference time. Elaborating further on this insight, we propose a novel deep network architecture, called "channel-out" network, which takes a much better advantage of sparse pathway encoding. In channel-out networks, pathways are not only formed a posteriori, but they are also actively selected according to the inference outputs from the lower layers. From a mathematical perspective, channel-out networks can represent a wider class of piece-wise continuous functions, thereby endowing the network with more expressive power than that of maxout networks. We test our channel-out networks on several well-known image classification benchmarks, setting new state-of-the-art performance on CIFAR-100 and STL-10, which represent some of the "harder" image classification benchmarks.Comment: 10 pages including the appendix, 9 figure

    On the Importance of Normalisation Layers in Deep Learning with Piecewise Linear Activation Units

    Full text link
    Deep feedforward neural networks with piecewise linear activations are currently producing the state-of-the-art results in several public datasets. The combination of deep learning models and piecewise linear activation functions allows for the estimation of exponentially complex functions with the use of a large number of subnetworks specialized in the classification of similar input examples. During the training process, these subnetworks avoid overfitting with an implicit regularization scheme based on the fact that they must share their parameters with other subnetworks. Using this framework, we have made an empirical observation that can improve even more the performance of such models. We notice that these models assume a balanced initial distribution of data points with respect to the domain of the piecewise linear activation function. If that assumption is violated, then the piecewise linear activation units can degenerate into purely linear activation units, which can result in a significant reduction of their capacity to learn complex functions. Furthermore, as the number of model layers increases, this unbalanced initial distribution makes the model ill-conditioned. Therefore, we propose the introduction of batch normalisation units into deep feedforward neural networks with piecewise linear activations, which drives a more balanced use of these activation units, where each region of the activation function is trained with a relatively large proportion of training samples. Also, this batch normalisation promotes the pre-conditioning of very deep learning models. We show that by introducing maxout and batch normalisation units to the network in network model results in a model that produces classification results that are better than or comparable to the current state of the art in CIFAR-10, CIFAR-100, MNIST, and SVHN datasets

    Learned-Norm Pooling for Deep Feedforward and Recurrent Neural Networks

    Full text link
    In this paper we propose and investigate a novel nonlinear unit, called LpL_p unit, for deep neural networks. The proposed LpL_p unit receives signals from several projections of a subset of units in the layer below and computes a normalized LpL_p norm. We notice two interesting interpretations of the LpL_p unit. First, the proposed unit can be understood as a generalization of a number of conventional pooling operators such as average, root-mean-square and max pooling widely used in, for instance, convolutional neural networks (CNN), HMAX models and neocognitrons. Furthermore, the LpL_p unit is, to a certain degree, similar to the recently proposed maxout unit (Goodfellow et al., 2013) which achieved the state-of-the-art object recognition results on a number of benchmark datasets. Secondly, we provide a geometrical interpretation of the activation function based on which we argue that the LpL_p unit is more efficient at representing complex, nonlinear separating boundaries. Each LpL_p unit defines a superelliptic boundary, with its exact shape defined by the order pp. We claim that this makes it possible to model arbitrarily shaped, curved boundaries more efficiently by combining a few LpL_p units of different orders. This insight justifies the need for learning different orders for each unit in the model. We empirically evaluate the proposed LpL_p units on a number of datasets and show that multilayer perceptrons (MLP) consisting of the LpL_p units achieve the state-of-the-art results on a number of benchmark datasets. Furthermore, we evaluate the proposed LpL_p unit on the recently proposed deep recurrent neural networks (RNN).Comment: ECML/PKDD 201