Search CORE

213 research outputs found

Short-term Memory of Deep RNN

Author: Gallicchio Claudio
Publication venue
Publication date: 01/01/2018
Field of study

The extension of deep learning towards temporal data processing is gaining an increasing research interest. In this paper we investigate the properties of state dynamics developed in successive levels of deep recurrent neural networks (RNNs) in terms of short-term memory abilities. Our results reveal interesting insights that shed light on the nature of layering as a factor of RNN design. Noticeably, higher layers in a hierarchically organized RNN architecture results to be inherently biased towards longer memory spans even prior to training of the recurrent connections. Moreover, in the context of Reservoir Computing framework, our analysis also points out the benefit of a layered recurrent organization as an efficient approach to improve the memory skills of reservoir models.Comment: This is a pre-print (pre-review) version of the paper accepted for presentation at the 26th European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning (ESANN), Bruges (Belgium), 25-27 April 201

arXiv.org e-Print Archive

Archivio della Ricerca - Università di Pisa

On the difficulty of learning chaotic dynamics with RNNs

Author: Durstewitz Daniel
Mikhaeil Jonas M.
Monfared Zahra
Publication venue
Publication date: 06/10/2022
Field of study

Recurrent neural networks (RNNs) are wide-spread machine learning tools for modeling sequential and time series data. They are notoriously hard to train because their loss gradients backpropagated in time tend to saturate or diverge during training. This is known as the exploding and vanishing gradient problem. Previous solutions to this issue either built on rather complicated, purpose-engineered architectures with gated memory buffers, or - more recently - imposed constraints that ensure convergence to a fixed point or restrict (the eigenspectrum of) the recurrence matrix. Such constraints, however, convey severe limitations on the expressivity of the RNN. Essential intrinsic dynamics such as multistability or chaos are disabled. This is inherently at disaccord with the chaotic nature of many, if not most, time series encountered in nature and society. It is particularly problematic in scientific applications where one aims to reconstruct the underlying dynamical system. Here we offer a comprehensive theoretical treatment of this problem by relating the loss gradients during RNN training to the Lyapunov spectrum of RNN-generated orbits. We mathematically prove that RNNs producing stable equilibrium or cyclic behavior have bounded gradients, whereas the gradients of RNNs with chaotic dynamics always diverge. Based on these analyses and insights we suggest ways of how to optimize the training process on chaotic data according to the system's Lyapunov spectrum, regardless of the employed RNN architecture

arXiv.org e-Print Archive

Learning dynamical systems from data: a simple cross-validation perspective

Author: Hamzi Boumediene
Owhadi Houman
Publication venue
Publication date: 09/07/2020
Field of study

Regressing the vector field of a dynamical system from a finite number of observed states is a natural way to learn surrogate models for such systems. We present variants of cross-validation (Kernel Flows \cite{Owhadi19} and its variants based on Maximum Mean Discrepancy and Lyapunov exponents) as simple approaches for learning the kernel used in these emulators.Comment: File uploaded on arxiv on Sunday, July 5th, 2020. Got delayed due to tex problems on ArXiv. Original version at https://www.researchgate.net/publication/342693818_Learning_dynamical_systems_from_data_a_simple_cross-validation_perspectiv

arXiv.org e-Print Archive

Caltech Authors