128 research outputs found
A State Space Approach for Piecewise-Linear Recurrent Neural Networks for Reconstructing Nonlinear Dynamics from Neural Measurements
The computational properties of neural systems are often thought to be
implemented in terms of their network dynamics. Hence, recovering the system
dynamics from experimentally observed neuronal time series, like multiple
single-unit (MSU) recordings or neuroimaging data, is an important step toward
understanding its computations. Ideally, one would not only seek a state space
representation of the dynamics, but would wish to have access to its governing
equations for in-depth analysis. Recurrent neural networks (RNNs) are a
computationally powerful and dynamically universal formal framework which has
been extensively studied from both the computational and the dynamical systems
perspective. Here we develop a semi-analytical maximum-likelihood estimation
scheme for piecewise-linear RNNs (PLRNNs) within the statistical framework of
state space models, which accounts for noise in both the underlying latent
dynamics and the observation process. The Expectation-Maximization algorithm is
used to infer the latent state distribution, through a global Laplace
approximation, and the PLRNN parameters iteratively. After validating the
procedure on toy examples, the approach is applied to MSU recordings from the
rodent anterior cingulate cortex obtained during performance of a classical
working memory task, delayed alternation. A model with 5 states turned out to
be sufficient to capture the essential computational dynamics underlying task
performance, including stimulus-selective delay activity. The estimated models
were rarely multi-stable, but rather were tuned to exhibit slow dynamics in the
vicinity of a bifurcation point. In summary, the present work advances a
semi-analytical (thus reasonably fast) maximum-likelihood estimation framework
for PLRNNs that may enable to recover the relevant dynamics underlying observed
neuronal time series, and directly link them to computational properties
Detecting Multiple Change Points Using Adaptive Regression Splines With Application to Neural Recordings
Time series, as frequently the case in neuroscience, are rarely stationary, but often exhibit abrupt changes due to attractor transitions or bifurcations in the dynamical systems producing them. A plethora of methods for detecting such change points in time series statistics have been developed over the years, in addition to test criteria to evaluate their significance. Issues to consider when developing change point analysis methods include computational demands, difficulties arising from either limited amount of data or a large number of covariates, and arriving at statistical tests with sufficient power to detect as many changes as contained in potentially high-dimensional time series. Here, a general method called Paired Adaptive Regressors for Cumulative Sum is developed for detecting multiple change points in the mean of multivariate time series. The method's advantages over alternative approaches are demonstrated through a series of simulation experiments. This is followed by a real data application to neural recordings from rat medial prefrontal cortex during learning. Finally, the method's flexibility to incorporate useful features from state-of-the-art change point detection techniques is discussed, along with potential drawbacks and suggestions to remedy them
On the difficulty of learning chaotic dynamics with RNNs
Recurrent neural networks (RNNs) are wide-spread machine learning tools for
modeling sequential and time series data. They are notoriously hard to train
because their loss gradients backpropagated in time tend to saturate or diverge
during training. This is known as the exploding and vanishing gradient problem.
Previous solutions to this issue either built on rather complicated,
purpose-engineered architectures with gated memory buffers, or - more recently
- imposed constraints that ensure convergence to a fixed point or restrict (the
eigenspectrum of) the recurrence matrix. Such constraints, however, convey
severe limitations on the expressivity of the RNN. Essential intrinsic dynamics
such as multistability or chaos are disabled. This is inherently at disaccord
with the chaotic nature of many, if not most, time series encountered in nature
and society. It is particularly problematic in scientific applications where
one aims to reconstruct the underlying dynamical system. Here we offer a
comprehensive theoretical treatment of this problem by relating the loss
gradients during RNN training to the Lyapunov spectrum of RNN-generated orbits.
We mathematically prove that RNNs producing stable equilibrium or cyclic
behavior have bounded gradients, whereas the gradients of RNNs with chaotic
dynamics always diverge. Based on these analyses and insights we suggest ways
of how to optimize the training process on chaotic data according to the
system's Lyapunov spectrum, regardless of the employed RNN architecture
Generalized Teacher Forcing for Learning Chaotic Dynamics
Chaotic dynamical systems (DS) are ubiquitous in nature and society. Often we
are interested in reconstructing such systems from observed time series for
prediction or mechanistic insight, where by reconstruction we mean learning
geometrical and invariant temporal properties of the system in question (like
attractors). However, training reconstruction algorithms like recurrent neural
networks (RNNs) on such systems by gradient-descent based techniques faces
severe challenges. This is mainly due to exploding gradients caused by the
exponential divergence of trajectories in chaotic systems. Moreover, for
(scientific) interpretability we wish to have as low dimensional
reconstructions as possible, preferably in a model which is mathematically
tractable. Here we report that a surprisingly simple modification of teacher
forcing leads to provably strictly all-time bounded gradients in training on
chaotic systems, and, when paired with a simple architectural rearrangement of
a tractable RNN design, piecewise-linear RNNs (PLRNNs), allows for faithful
reconstruction in spaces of at most the dimensionality of the observed system.
We show on several DS that with these amendments we can reconstruct DS better
than current SOTA algorithms, in much lower dimensions. Performance differences
were particularly compelling on real world data with which most other methods
severely struggled. This work thus led to a simple yet powerful DS
reconstruction algorithm which is highly interpretable at the same time.Comment: Published in the Proceedings of the 40th International Conference on
Machine Learning (ICML 2023
Bifurcations and loss jumps in RNN training
Recurrent neural networks (RNNs) are popular machine learning tools for
modeling and forecasting sequential data and for inferring dynamical systems
(DS) from observed time series. Concepts from DS theory (DST) have variously
been used to further our understanding of both, how trained RNNs solve complex
tasks, and the training process itself. Bifurcations are particularly important
phenomena in DS, including RNNs, that refer to topological (qualitative)
changes in a system's dynamical behavior as one or more of its parameters are
varied. Knowing the bifurcation structure of an RNN will thus allow to deduce
many of its computational and dynamical properties, like its sensitivity to
parameter variations or its behavior during training. In particular,
bifurcations may account for sudden loss jumps observed in RNN training that
could severely impede the training process. Here we first mathematically prove
for a particular class of ReLU-based RNNs that certain bifurcations are indeed
associated with loss gradients tending toward infinity or zero. We then
introduce a novel heuristic algorithm for detecting all fixed points and
k-cycles in ReLU-based RNNs and their existence and stability regions, hence
bifurcation manifolds in parameter space. In contrast to previous numerical
algorithms for finding fixed points and common continuation methods, our
algorithm provides exact results and returns fixed points and cycles up to high
orders with surprisingly good scaling behavior. We exemplify the algorithm on
the analysis of the training process of RNNs, and find that the recently
introduced technique of generalized teacher forcing completely avoids certain
types of bifurcations in training. Thus, besides facilitating the DST analysis
of trained RNNs, our algorithm provides a powerful instrument for analyzing the
training process itself
- …