2 research outputs found
Response Characterization for Auditing Cell Dynamics in Long Short-term Memory Networks
In this paper, we introduce a novel method to interpret recurrent neural
networks (RNNs), particularly long short-term memory networks (LSTMs) at the
cellular level. We propose a systematic pipeline for interpreting individual
hidden state dynamics within the network using response characterization
methods. The ranked contribution of individual cells to the network's output is
computed by analyzing a set of interpretable metrics of their decoupled step
and sinusoidal responses. As a result, our method is able to uniquely identify
neurons with insightful dynamics, quantify relationships between dynamical
properties and test accuracy through ablation analysis, and interpret the
impact of network capacity on a network's dynamical distribution. Finally, we
demonstrate generalizability and scalability of our method by evaluating a
series of different benchmark sequential datasets
Learning Long-Term Dependencies is Not as Difficult With NARX Recurrent Neural Networks
It has recently been shown that gradient descent learning algorithms for
recurrent neural networks can perform poorly on tasks that involve long-
term dependencies, i.e. those problems for which the desired output
depends on inputs presented at times far in the past.
In this paper we explore the long-term dependencies problem for a class of
architectures called NARX recurrent neural networks, which have power
ful representational capabilities. We have previously reported that gradient
descent learning is more effective in NARX networks than in recurrent
neural network architectures that have ``hidden states'' on problems includ
ing grammatical inference and nonlinear system identification. Typically,
the network converges much faster and generalizes better than other net
works. The results in this paper are an attempt to explain this phenomenon.
We present some experimental results which show that NARX networks
can often retain information for two to three times as long as conventional
recurrent neural networks. We show that although NARX networks do not
circumvent the problem of long-term dependencies, they can greatly
improve performance on long-term dependency problems.
We also describe in detail some of the assumption regarding what it means
to latch information robustly and suggest possible ways to loosen these
assumptions.
(Also cross-referenced as UMIACS-TR-95-78