Search CORE

635 research outputs found

Feedforward and Recurrent Neural Networks Backward Propagation and Hessian in Matrix Form

Author: Naumov Maxim
Publication venue
Publication date: 16/09/2017
Field of study

In this paper we focus on the linear algebra theory behind feedforward (FNN) and recurrent (RNN) neural networks. We review backward propagation, including backward propagation through time (BPTT). Also, we obtain a new exact expression for Hessian, which represents second order effects. We show that for

t

time steps the weight gradient can be expressed as a rank-

t

matrix, while the weight Hessian is as a sum of

t^{2}

Kronecker products of rank-

1

and

W^{T}AW

matrices, for some matrix

A

and weight matrix

W

. Also, we show that for a mini-batch of size

r

, the weight update can be expressed as a rank-

rt

matrix. Finally, we briefly comment on the eigenvalues of the Hessian matrix.Comment: 23 pages, 4 figure

arXiv.org e-Print Archive

The Outer Product Structure of Neural Network Derivatives

Author: Bakker Craig
Henry Michael J.
Hodas Nathan O.
Publication venue
Publication date: 08/10/2018
Field of study

In this paper, we show that feedforward and recurrent neural networks exhibit an outer product derivative structure but that convolutional neural networks do not. This structure makes it possible to use higher-order information without needing approximations or infeasibly large amounts of memory, and it may also provide insights into the geometry of neural network optima. The ability to easily access these derivatives also suggests a new, geometric approach to regularization. We then discuss how this structure could be used to improve training methods, increase network robustness and generalizability, and inform network compression methods

arXiv.org e-Print Archive

Hessian-free Optimization for Learning Deep Multidimensional Recurrent Neural Networks

Author: Cho Minhyung
Dhir Chandra Shekhar
Lee Jaehyung
Publication venue
Publication date: 23/10/2015
Field of study

Multidimensional recurrent neural networks (MDRNNs) have shown a remarkable performance in the area of speech and handwriting recognition. The performance of an MDRNN is improved by further increasing its depth, and the difficulty of learning the deeper network is overcome by using Hessian-free (HF) optimization. Given that connectionist temporal classification (CTC) is utilized as an objective of learning an MDRNN for sequence labeling, the non-convexity of CTC poses a problem when applying HF to the network. As a solution, a convex approximation of CTC is formulated and its relationship with the EM algorithm and the Fisher information matrix is discussed. An MDRNN up to a depth of 15 layers is successfully trained using HF, resulting in an improved performance for sequence labeling.Comment: to appear at NIPS 201

arXiv.org e-Print Archive

A Critical Review of Recurrent Neural Networks for Sequence Learning

Author: Berkowitz John
Elkan Charles
Lipton Zachary C.
Publication venue
Publication date: 17/10/2015
Field of study

Countless learning tasks require dealing with sequential data. Image captioning, speech synthesis, and music generation all require that a model produce outputs that are sequences. In other domains, such as time series prediction, video analysis, and musical information retrieval, a model must learn from inputs that are sequences. Interactive tasks, such as translating natural language, engaging in dialogue, and controlling a robot, often demand both capabilities. Recurrent neural networks (RNNs) are connectionist models that capture the dynamics of sequences via cycles in the network of nodes. Unlike standard feedforward neural networks, recurrent networks retain a state that can represent information from an arbitrarily long context window. Although recurrent neural networks have traditionally been difficult to train, and often contain millions of parameters, recent advances in network architectures, optimization techniques, and parallel computation have enabled successful large-scale learning with them. In recent years, systems based on long short-term memory (LSTM) and bidirectional (BRNN) architectures have demonstrated ground-breaking performance on tasks as varied as image captioning, language translation, and handwriting recognition. In this survey, we review and synthesize the research that over the past three decades first yielded and then made practical these powerful learning models. When appropriate, we reconcile conflicting notation and nomenclature. Our goal is to provide a self-contained explication of the state of the art together with a historical perspective and references to primary research

arXiv.org e-Print Archive

A Theory of Local Learning, the Learning Channel, and the Optimality of Backpropagation

Author: Baldi Pierre
Sadowski Peter
Publication venue: 'Elsevier BV'
Publication date: 21/10/2016
Field of study

In a physical neural system, where storage and processing are intimately intertwined, the rules for adjusting the synaptic weights can only depend on variables that are available locally, such as the activity of the pre- and post-synaptic neurons, resulting in local learning rules. A systematic framework for studying the space of local learning rules is obtained by first specifying the nature of the local variables, and then the functional form that ties them together into each learning rule. Such a framework enables also the systematic discovery of new learning rules and exploration of relationships between learning rules and group symmetries. We study polynomial local learning rules stratified by their degree and analyze their behavior and capabilities in both linear and non-linear units and networks. Stacking local learning rules in deep feedforward networks leads to deep local learning. While deep local learning can learn interesting representations, it cannot learn complex input-output functions, even when targets are available for the top layer. Learning complex input-output functions requires local deep learning where target information is communicated to the deep layers through a backward learning channel. The nature of the communicated information about the targets and the structure of the learning channel partition the space of learning algorithms. We estimate the learning channel capacity associated with several algorithms and show that backpropagation outperforms them by simultaneously maximizing the information rate and minimizing the computational cost, even in recurrent networks. The theory clarifies the concept of Hebbian learning, establishes the power and limitations of local learning rules, introduces the learning channel which enables a formal analysis of the optimality of backpropagation, and explains the sparsity of the space of learning rules discovered so far

arXiv.org e-Print Archive

A Survey on Methods and Theories of Quantized Neural Networks

Author: Guo Yunhui
Publication venue
Publication date: 16/12/2018
Field of study

Deep neural networks are the state-of-the-art methods for many real-world tasks, such as computer vision, natural language processing and speech recognition. For all its popularity, deep neural networks are also criticized for consuming a lot of memory and draining battery life of devices during training and inference. This makes it hard to deploy these models on mobile or embedded devices which have tight resource constraints. Quantization is recognized as one of the most effective approaches to satisfy the extreme memory requirements that deep neural network models demand. Instead of adopting 32-bit floating point format to represent weights, quantized representations store weights using more compact formats such as integers or even binary numbers. Despite a possible degradation in predictive performance, quantization provides a potential solution to greatly reduce the model size and the energy consumption. In this survey, we give a thorough review of different aspects of quantized neural networks. Current challenges and trends of quantized neural networks are also discussed.Comment: 17 pages, 8 figure

arXiv.org e-Print Archive

Parallel Complexity of Forward and Backward Propagation

Author: Naumov Maxim
Publication venue
Publication date: 18/12/2017
Field of study

We show that the forward and backward propagation can be formulated as a solution of lower and upper triangular systems of equations. For standard feedforward (FNNs) and recurrent neural networks (RNNs) the triangular systems are always block bi-diagonal, while for a general computation graph (directed acyclic graph) they can have a more complex triangular sparsity pattern. We discuss direct and iterative parallel algorithms that can be used for their solution and interpreted as different ways of performing model parallelism. Also, we show that for FNNs and RNNs with

k

layers and

\tau

time steps the backward propagation can be performed in parallel in O(

\log k

) and O(

\log k \log \tau

) steps, respectively. Finally, we outline the generalization of this technique using Jacobians that potentially allows us to handle arbitrary layers.Comment: 18 page

arXiv.org e-Print Archive

Mean Field Theory of Activation Functions in Deep Neural Networks

Author: Chotibut Thiparat
Milletarí Mirco
Trevisanutto Paolo E.
Publication venue
Publication date: 05/06/2019
Field of study

We present a Statistical Mechanics (SM) model of deep neural networks, connecting the energy-based and the feed forward networks (FFN) approach. We infer that FFN can be understood as performing three basic steps: encoding, representation validation and propagation. From the meanfield solution of the model, we obtain a set of natural activations -- such as Sigmoid,

\tanh

and ReLu -- together with the state-of-the-art, Swish; this represents the expected information propagating through the network and tends to ReLu in the limit of zero noise.We study the spectrum of the Hessian on an associated classification task, showing that Swish allows for more consistent performances over a wider range of network architectures.Comment: Presented at the ICML 2019 Workshop on Theoretical Physics forDeep Learnin

arXiv.org e-Print Archive

Biological credit assignment through dynamic inversion of feedforward networks

Author: Machens Christian K.
Podlaski William F.
Publication venue
Publication date: 03/01/2021
Field of study

Learning depends on changes in synaptic connections deep inside the brain. In multilayer networks, these changes are triggered by error signals fed back from the output, generally through a stepwise inversion of the feedforward processing steps. The gold standard for this process -- backpropagation -- works well in artificial neural networks, but is biologically implausible. Several recent proposals have emerged to address this problem, but many of these biologically-plausible schemes are based on learning an independent set of feedback connections. This complicates the assignment of errors to each synapse by making it dependent upon a second learning problem, and by fitting inversions rather than guaranteeing them. Here, we show that feedforward network transformations can be effectively inverted through dynamics. We derive this dynamic inversion from the perspective of feedback control, where the forward transformation is reused and dynamically interacts with fixed or random feedback to propagate error signals during the backward pass. Importantly, this scheme does not rely upon a second learning problem for feedback because accurate inversion is guaranteed through the network dynamics. We map these dynamics onto generic feedforward networks, and show that the resulting algorithm performs well on several supervised and unsupervised datasets. Finally, we discuss potential links between dynamic inversion and second-order optimization. Overall, our work introduces an alternative perspective on credit assignment in the brain, and proposes a special role for temporal dynamics and feedback control during learning.Comment: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), Vancouver, Canad

arXiv.org e-Print Archive

Activation Relaxation: A Local Dynamical Approximation to Backpropagation in the Brain

Author: Buckley Christopher L
Millidge Beren
Seth Anil K
Tschantz Alexander
Publication venue
Publication date: 05/10/2020
Field of study

The backpropagation of error algorithm (backprop) has been instrumental in the recent success of deep learning. However, a key question remains as to whether backprop can be formulated in a manner suitable for implementation in neural circuitry. The primary challenge is to ensure that any candidate formulation uses only local information, rather than relying on global signals as in standard backprop. Recently several algorithms for approximating backprop using only local signals have been proposed. However, these algorithms typically impose other requirements which challenge biological plausibility: for example, requiring complex and precise connectivity schemes, or multiple sequential backwards phases with information being stored across phases. Here, we propose a novel algorithm, Activation Relaxation (AR), which is motivated by constructing the backpropagation gradient as the equilibrium point of a dynamical system. Our algorithm converges rapidly and robustly to the correct backpropagation gradients, requires only a single type of computational unit, utilises only a single parallel backwards relaxation phase, and can operate on arbitrary computation graphs. We illustrate these properties by training deep neural networks on visual classification tasks, and describe simplifications to the algorithm which remove further obstacles to neurobiological implementation (for example, the weight-transport problem, and the use of nonlinear derivatives), while preserving performance.Comment: initial upload; revised version (updated abstract, related work) 28-09-20; 05/10/20: revised for ICLR submissio

arXiv.org e-Print Archive