Search CORE

98 research outputs found

Short-term Memory of Deep RNN

Author: Gallicchio Claudio
Publication venue
Publication date: 01/01/2018
Field of study

The extension of deep learning towards temporal data processing is gaining an increasing research interest. In this paper we investigate the properties of state dynamics developed in successive levels of deep recurrent neural networks (RNNs) in terms of short-term memory abilities. Our results reveal interesting insights that shed light on the nature of layering as a factor of RNN design. Noticeably, higher layers in a hierarchically organized RNN architecture results to be inherently biased towards longer memory spans even prior to training of the recurrent connections. Moreover, in the context of Reservoir Computing framework, our analysis also points out the benefit of a layered recurrent organization as an efficient approach to improve the memory skills of reservoir models.Comment: This is a pre-print (pre-review) version of the paper accepted for presentation at the 26th European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning (ESANN), Bruges (Belgium), 25-27 April 201

arXiv.org e-Print Archive

Archivio della Ricerca - Università di Pisa

Sparsity in Reservoir Computing Neural Networks

Author: Gallicchio Claudio
Publication venue
Publication date: 01/01/2020
Field of study

Reservoir Computing (RC) is a well-known strategy for designing Recurrent Neural Networks featured by striking efficiency of training. The crucial aspect of RC is to properly instantiate the hidden recurrent layer that serves as dynamical memory to the system. In this respect, the common recipe is to create a pool of randomly and sparsely connected recurrent neurons. While the aspect of sparsity in the design of RC systems has been debated in the literature, it is nowadays understood mainly as a way to enhance the efficiency of computation, exploiting sparse matrix operations. In this paper, we empirically investigate the role of sparsity in RC network design under the perspective of the richness of the developed temporal representations. We analyze both sparsity in the recurrent connections, and in the connections from the input to the reservoir. Our results point out that sparsity, in particular in input-reservoir connections, has a major role in developing internal temporal representations that have a longer short-term memory of past inputs and a higher dimension.Comment: This paper is currently under revie

arXiv.org e-Print Archive

Crossref

ZENODO

Archivio della Ricerca - Università di Pisa

Tree Edit Distance Learning via Adaptive Symbol Embeddings

Author: Gallicchio Claudio
Hammer Barbara
Micheli Alessio
Paaßen Benjamin
Publication venue
Publication date: 01/01/2018
Field of study

Metric learning has the aim to improve classification accuracy by learning a distance measure which brings data points from the same class closer together and pushes data points from different classes further apart. Recent research has demonstrated that metric learning approaches can also be applied to trees, such as molecular structures, abstract syntax trees of computer programs, or syntax trees of natural language, by learning the cost function of an edit distance, i.e. the costs of replacing, deleting, or inserting nodes in a tree. However, learning such costs directly may yield an edit distance which violates metric axioms, is challenging to interpret, and may not generalize well. In this contribution, we propose a novel metric learning approach for trees which we call embedding edit distance learning (BEDL) and which learns an edit distance indirectly by embedding the tree nodes as vectors, such that the Euclidean distance between those vectors supports class discrimination. We learn such embeddings by reducing the distance to prototypical trees from the same class and increasing the distance to prototypical trees from different classes. In our experiments, we show that BEDL improves upon the state-of-the-art in metric learning for trees on six benchmark data sets, ranging from computer science over biomedical data to a natural-language processing data set containing over 300,000 nodes.Comment: Paper at the International Conference of Machine Learning (2018), 2018-07-10 to 2018-07-15 in Stockholm, Swede

arXiv.org e-Print Archive

Archivio della Ricerca - Università di Pisa

Tree Echo State Networks

Author: Gallicchio Claudio
Micheli Alessio
Publication venue: 'Elsevier BV'
Publication date: 01/01/2013
Field of study

In this paper we present the Tree Echo State Network (TreeESN) model, generalizing the paradigm of Reservoir Computing to tree structured data. TreeESNs exploit an untrained generalized recursive reservoir, exhibiting extreme efficiency for learning in structured domains. In addition, we highlight through the paper other characteristics of the approach: First, we discuss the Markovian characterization of reservoir dynamics, extended to the case of tree domains, that is implied by the contractive setting of the TreeESN state transition function. Second, we study two types of state mapping functions to map the tree structured state of TreeESN into a fixed-size feature representation for classification or regression tasks. The critical role of the relation between the choice of the state mapping function and the Markovian characterization of the task is analyzed and experimentally investigated on both artificial and real-world tasks. Finally, experimental results on benchmark and real-world tasks show that the TreeESN approach, in spite of its efficiency, can achieve comparable results with state-of-the-art, although more complex, neural and kernel based models for tree structured data

Crossref

Archivio della Ricerca - Università di Pisa

Fast and Deep Graph Neural Networks

Author: Gallicchio Claudio
Micheli Alessio
Publication venue
Publication date: 20/11/2019
Field of study

We address the efficiency issue for the construction of a deep graph neural network (GNN). The approach exploits the idea of representing each input graph as a fixed point of a dynamical system (implemented through a recurrent neural network), and leverages a deep architectural organization of the recurrent units. Efficiency is gained by many aspects, including the use of small and very sparse networks, where the weights of the recurrent units are left untrained under the stability condition introduced in this work. This can be viewed as a way to study the intrinsic power of the architecture of a deep GNN, and also to provide insights for the set-up of more complex fully-trained models. Through experimental results, we show that even without training of the recurrent connections, the architecture of small deep GNN is surprisingly able to achieve or improve the state-of-the-art performance on a significant set of tasks in the field of graphs classification.Comment: Pre-print of 'Fast and Deep Graph Neural Networks', accepted for AAAI 2020. This document includes the Supplementary Materia

arXiv.org e-Print Archive

Archivio della Ricerca - Università di Pisa

Association for the Advancement of Artificial Intelligence: AAAI Publications

Edge of stability echo state networks

Author: Ceni Andrea
Gallicchio Claudio
Publication venue
Publication date: 03/09/2023
Field of study

Echo State Networks (ESNs) are time-series processing models working under the Echo State Property (ESP) principle. The ESP is a notion of stability that imposes an asymptotic fading of the memory of the input. On the other hand, the resulting inherent architectural bias of ESNs may lead to an excessive loss of information, which in turn harms the performance in certain tasks with long short-term memory requirements. With the goal of bringing together the fading memory property and the ability to retain as much memory as possible, in this paper we introduce a new ESN architecture, called the Edge of Stability Echo State Network (ES

^2

N). The introduced ES

^2

N model is based on defining the reservoir layer as a convex combination of a nonlinear reservoir (as in the standard ESN), and a linear reservoir that implements an orthogonal transformation. We provide a thorough mathematical analysis of the introduced model, proving that the whole eigenspectrum of the Jacobian of the ES

^2

N map can be contained in an annular neighbourhood of a complex circle of controllable radius, and exploit this property to demonstrate that the ES

^2

N's forward dynamics evolves close to the edge-of-chaos regime by design. Remarkably, our experimental analysis shows that the newly introduced reservoir model is able to reach the theoretical maximum short-term memory capacity. At the same time, in comparison to standard ESN, ES

^2

N is shown to offer an excellent trade-off between memory and nonlinearity, as well as a significant improvement of performance in autoregressive nonlinear modeling

arXiv.org e-Print Archive

Hierarchical Temporal Representation in Linear Reservoir Computing

Author: Claudio Gallicchio
Claudio Gallicchio
D Koryakin
D Verstraeten
G Holzmann
H Jaeger
H Jaeger
J Schmidhuber
J Schmidhuber
M Lukoševičius
M Čerňanskỳ
S Otte
Y Xue
Publication venue
Publication date: 10/07/2017
Field of study

Recently, studies on deep Reservoir Computing (RC) highlighted the role of layering in deep recurrent neural networks (RNNs). In this paper, the use of linear recurrent units allows us to bring more evidence on the intrinsic hierarchical temporal representation in deep RNNs through frequency analysis applied to the state signals. The potentiality of our approach is assessed on the class of Multiple Superimposed Oscillator tasks. Furthermore, our investigation provides useful insights to open a discussion on the main aspects that characterize the deep learning framework in the temporal domain.Comment: This is a pre-print of the paper submitted to the 27th Italian Workshop on Neural Networks, WIRN 201

arXiv.org e-Print Archive

Crossref

Deep Echo State Networks for Diagnosis of Parkinson's Disease

Author: Gallicchio Claudio
Micheli Alessio
Pedrelli Luca
Publication venue
Publication date: 01/01/2018
Field of study

In this paper, we introduce a novel approach for diagnosis of Parkinson's Disease (PD) based on deep Echo State Networks (ESNs). The identification of PD is performed by analyzing the whole time-series collected from a tablet device during the sketching of spiral tests, without the need for feature extraction and data preprocessing. We evaluated the proposed approach on a public dataset of spiral tests. The results of experimental analysis show that DeepESNs perform significantly better than shallow ESN model. Overall, the proposed approach obtains state-of-the-art results in the identification of PD on this kind of temporal data.Comment: This is a pre-print of the paper submitted to the European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning, ESANN 201

arXiv.org e-Print Archive

Archivio della Ricerca - Università di Pisa