141 research outputs found
Interpreting multi-stable behaviour in input-driven recurrent neural networks
Recurrent neural networks (RNNs) are computational models inspired by the brain. Although RNNs stand out as state-of-the-art machine learning models to solve challenging tasks as speech recognition, handwriting recognition, language translation, and others, they are plagued by the so-called vanishing/exploding gradient issue. This prevents us from training RNNs with the aim of learning long term dependencies in sequential data. Moreover, a problem of interpretability affects these models, known as the ``black-box issue'' of RNNs. We attempt to open the black box by developing a mechanistic interpretation of errors occurring during the computation. We do this from a dynamical system theory perspective, specifically building on the notion of Excitable Network Attractors. Our methodology is effective at least for those tasks where a number of attractors and a switching pattern between them must be learned. RNNs can be seen as massively large nonlinear dynamical systems driven by external inputs. When it comes to analytically investigate RNNs, often in the literature the input-driven property is neglected or dropped in favour of tight constraints on the input driving the dynamics, which do not match the reality of RNN applications. Trying to bridge this gap, we framed RNNs dynamics driven by generic input sequences in the context of nonautonomous dynamical system theory. This brought us to enquire deeply into a fundamental principle established for RNNs known as the echo state property (ESP). In particular, we argue that input-driven RNNs can be reliable computational models even without satisfying the classical ESP formulation. We prove a sort of input-driven fixed point theorem and exploit it to (i) demonstrate the existence and uniqueness of a global attracting solution for strongly (in amplitude) input-driven RNNs, (ii) deduce the existence of multiple responses for certain input signals which can be reliably exploited for computational purposes, and (iii) study the stability of attracting solutions w.r.t. input sequences. Finally, we highlight the active role of the input in determining qualitative changes in the RNN dynamics, e.g. the number of stable responses, in contrast to commonly known qualitative changes due to variations of model parameters
Beyond exploding and vanishing gradients:analysing RNN training using attractors and smoothness
The exploding and vanishing gradient problem has been the major conceptual principle behind most architecture and training improvements in recurrent neural networks (RNNs) during the last decade. In this paper, we argue that this principle, while powerful, might need some refinement to explain recent developments. We refine the concept of exploding gradients by reformulating the problem in terms of the cost function smoothness, which gives insight into higher-order derivatives and the existence of regions with many close local minima. We also clarify the distinction between vanishing gradients and the need for the RNN to learn attractors to fully use its expressive power. Through the lens of these refinements, we shed new light on recent developments in the RNN field, namely stable RNN and unitary (or orthogonal) RNNs
Bifurcations and loss jumps in RNN training
Recurrent neural networks (RNNs) are popular machine learning tools for
modeling and forecasting sequential data and for inferring dynamical systems
(DS) from observed time series. Concepts from DS theory (DST) have variously
been used to further our understanding of both, how trained RNNs solve complex
tasks, and the training process itself. Bifurcations are particularly important
phenomena in DS, including RNNs, that refer to topological (qualitative)
changes in a system's dynamical behavior as one or more of its parameters are
varied. Knowing the bifurcation structure of an RNN will thus allow to deduce
many of its computational and dynamical properties, like its sensitivity to
parameter variations or its behavior during training. In particular,
bifurcations may account for sudden loss jumps observed in RNN training that
could severely impede the training process. Here we first mathematically prove
for a particular class of ReLU-based RNNs that certain bifurcations are indeed
associated with loss gradients tending toward infinity or zero. We then
introduce a novel heuristic algorithm for detecting all fixed points and
k-cycles in ReLU-based RNNs and their existence and stability regions, hence
bifurcation manifolds in parameter space. In contrast to previous numerical
algorithms for finding fixed points and common continuation methods, our
algorithm provides exact results and returns fixed points and cycles up to high
orders with surprisingly good scaling behavior. We exemplify the algorithm on
the analysis of the training process of RNNs, and find that the recently
introduced technique of generalized teacher forcing completely avoids certain
types of bifurcations in training. Thus, besides facilitating the DST analysis
of trained RNNs, our algorithm provides a powerful instrument for analyzing the
training process itself
Beyond exploding and vanishing gradients: analysing RNN training using attractors and smoothness
The exploding and vanishing gradient problem has been the major conceptual
principle behind most architecture and training improvements in recurrent
neural networks (RNNs) during the last decade. In this paper, we argue that
this principle, while powerful, might need some refinement to explain recent
developments. We refine the concept of exploding gradients by reformulating the
problem in terms of the cost function smoothness, which gives insight into
higher-order derivatives and the existence of regions with many close local
minima. We also clarify the distinction between vanishing gradients and the
need for the RNN to learn attractors to fully use its expressive power. Through
the lens of these refinements, we shed new light on recent developments in the
RNN field, namely stable RNN and unitary (or orthogonal) RNNs.Comment: To appear in the Proceedings of the 23rd International Conference on
Artificial Intelligence and Statistics (AISTATS), 2020. PMLR: Volume 108.
This paper was previously titled "The trade-off between long-term memory and
smoothness for recurrent networks". The current version subsumes all previous
version
Interpreting recurrent neural networks behaviour via excitable network attractors
Introduction: Machine learning provides fundamental tools both for scientific
research and for the development of technologies with significant impact on
society. It provides methods that facilitate the discovery of regularities in
data and that give predictions without explicit knowledge of the rules
governing a system. However, a price is paid for exploiting such flexibility:
machine learning methods are typically black-boxes where it is difficult to
fully understand what the machine is doing or how it is operating. This poses
constraints on the applicability and explainability of such methods. Methods:
Our research aims to open the black-box of recurrent neural networks, an
important family of neural networks used for processing sequential data. We
propose a novel methodology that provides a mechanistic interpretation of
behaviour when solving a computational task. Our methodology uses mathematical
constructs called excitable network attractors, which are invariant sets in
phase space composed of stable attractors and excitable connections between
them. Results and Discussion: As the behaviour of recurrent neural networks
depends both on training and on inputs to the system, we introduce an algorithm
to extract network attractors directly from the trajectory of a neural network
while solving tasks. Simulations conducted on a controlled benchmark task
confirm the relevance of these attractors for interpreting the behaviour of
recurrent neural networks, at least for tasks that involve learning a finite
number of stable states and transitions between them.Comment: revised versio
Low Tensor Rank Learning of Neural Dynamics
Learning relies on coordinated synaptic changes in recurrently connected
populations of neurons. Therefore, understanding the collective evolution of
synaptic connectivity over learning is a key challenge in neuroscience and
machine learning. In particular, recent work has shown that the weight matrices
of task-trained RNNs are typically low rank, but how this low rank structure
unfolds over learning is unknown. To address this, we investigate the rank of
the 3-tensor formed by the weight matrices throughout learning. By fitting RNNs
of varying rank to large-scale neural recordings during a motor learning task,
we find that the inferred weights are low-tensor-rank and therefore evolve over
a fixed low-dimensional subspace throughout the entire course of learning. We
next validate the observation of low-tensor-rank learning on an RNN trained to
solve the same task by performing a low-tensor-rank decomposition directly on
the ground truth weights, and by showing that the method we applied to the data
faithfully recovers this low rank structure. Finally, we present a set of
mathematical results bounding the matrix and tensor ranks of gradient descent
learning dynamics which show that low-tensor-rank weights emerge naturally in
RNNs trained to solve low-dimensional tasks. Taken together, our findings
provide novel constraints on the evolution of population connectivity over
learning in both biological and artificial neural networks, and enable reverse
engineering of learning-induced changes in recurrent network dynamics from
large-scale neural recordings.Comment: The last two authors contributed equall
NAIS-Net: Stable Deep Networks from Non-Autonomous Differential Equations
This paper introduces Non-Autonomous Input-Output Stable Network (NAIS-Net),
a very deep architecture where each stacked processing block is derived from a
time-invariant non-autonomous dynamical system. Non-autonomy is implemented by
skip connections from the block input to each of the unrolled processing stages
and allows stability to be enforced so that blocks can be unrolled adaptively
to a pattern-dependent processing depth. NAIS-Net induces non-trivial,
Lipschitz input-output maps, even for an infinite unroll length. We prove that
the network is globally asymptotically stable so that for every initial
condition there is exactly one input-dependent equilibrium assuming tanh units,
and multiple stable equilibria for ReL units. An efficient implementation that
enforces the stability under derived conditions for both fully-connected and
convolutional layers is also presented. Experimental results show how NAIS-Net
exhibits stability in practice, yielding a significant reduction in
generalization gap compared to ResNets.Comment: NIPS 201
- …