17,606 research outputs found
Incremental construction of LSTM recurrent neural network
Long Short--Term Memory (LSTM) is a recurrent neural network that
uses structures called memory blocks to allow the net remember
significant events distant in the past input sequence in order to
solve long time lag tasks, where other RNN approaches fail.
Throughout this work we have performed experiments using LSTM
networks extended with growing abilities, which we call GLSTM.
Four methods of training growing LSTM has been compared. These
methods include cascade and fully connected hidden layers as well
as two different levels of freezing previous weights in the
cascade case. GLSTM has been applied to a forecasting problem in a biomedical domain, where the input/output behavior of five
controllers of the Central Nervous System control has to be
modelled. We have compared growing LSTM results against other
neural networks approaches, and our work applying conventional
LSTM to the task at hand.Postprint (published version
A geometrical analysis of global stability in trained feedback networks
Recurrent neural networks have been extensively studied in the context of
neuroscience and machine learning due to their ability to implement complex
computations. While substantial progress in designing effective learning
algorithms has been achieved in the last years, a full understanding of trained
recurrent networks is still lacking. Specifically, the mechanisms that allow
computations to emerge from the underlying recurrent dynamics are largely
unknown. Here we focus on a simple, yet underexplored computational setup: a
feedback architecture trained to associate a stationary output to a stationary
input. As a starting point, we derive an approximate analytical description of
global dynamics in trained networks which assumes uncorrelated connectivity
weights in the feedback and in the random bulk. The resulting mean-field theory
suggests that the task admits several classes of solutions, which imply
different stability properties. Different classes are characterized in terms of
the geometrical arrangement of the readout with respect to the input vectors,
defined in the high-dimensional space spanned by the network population. We
find that such approximate theoretical approach can be used to understand how
standard training techniques implement the input-output task in finite-size
feedback networks. In particular, our simplified description captures the local
and the global stability properties of the target solution, and thus predicts
training performance
Identifying and attacking the saddle point problem in high-dimensional non-convex optimization
A central challenge to many fields of science and engineering involves
minimizing non-convex error functions over continuous, high dimensional spaces.
Gradient descent or quasi-Newton methods are almost ubiquitously used to
perform such minimizations, and it is often thought that a main source of
difficulty for these local methods to find the global minimum is the
proliferation of local minima with much higher error than the global minimum.
Here we argue, based on results from statistical physics, random matrix theory,
neural network theory, and empirical evidence, that a deeper and more profound
difficulty originates from the proliferation of saddle points, not local
minima, especially in high dimensional problems of practical interest. Such
saddle points are surrounded by high error plateaus that can dramatically slow
down learning, and give the illusory impression of the existence of a local
minimum. Motivated by these arguments, we propose a new approach to
second-order optimization, the saddle-free Newton method, that can rapidly
escape high dimensional saddle points, unlike gradient descent and quasi-Newton
methods. We apply this algorithm to deep or recurrent neural network training,
and provide numerical evidence for its superior optimization performance.Comment: The theoretical review and analysis in this article draw heavily from
arXiv:1405.4604 [cs.LG
- …