9,777 research outputs found
Understanding Hidden Memories of Recurrent Neural Networks
Recurrent neural networks (RNNs) have been successfully applied to various
natural language processing (NLP) tasks and achieved better results than
conventional methods. However, the lack of understanding of the mechanisms
behind their effectiveness limits further improvements on their architectures.
In this paper, we present a visual analytics method for understanding and
comparing RNN models for NLP tasks. We propose a technique to explain the
function of individual hidden state units based on their expected response to
input texts. We then co-cluster hidden state units and words based on the
expected response and visualize co-clustering results as memory chips and word
clouds to provide more structured knowledge on RNNs' hidden states. We also
propose a glyph-based sequence visualization based on aggregate information to
analyze the behavior of an RNN's hidden state at the sentence-level. The
usability and effectiveness of our method are demonstrated through case studies
and reviews from domain experts.Comment: Published at IEEE Conference on Visual Analytics Science and
Technology (IEEE VAST 2017
Sparse Attentive Backtracking: Temporal CreditAssignment Through Reminding
Learning long-term dependencies in extended temporal sequences requires
credit assignment to events far back in the past. The most common method for
training recurrent neural networks, back-propagation through time (BPTT),
requires credit information to be propagated backwards through every single
step of the forward computation, potentially over thousands or millions of time
steps. This becomes computationally expensive or even infeasible when used with
long sequences. Importantly, biological brains are unlikely to perform such
detailed reverse replay over very long sequences of internal states (consider
days, months, or years.) However, humans are often reminded of past memories or
mental states which are associated with the current mental state. We consider
the hypothesis that such memory associations between past and present could be
used for credit assignment through arbitrarily long sequences, propagating the
credit assigned to the current state to the associated past state. Based on
this principle, we study a novel algorithm which only back-propagates through a
few of these temporal skip connections, realized by a learned attention
mechanism that associates current states with relevant past states. We
demonstrate in experiments that our method matches or outperforms regular BPTT
and truncated BPTT in tasks involving particularly long-term dependencies, but
without requiring the biologically implausible backward replay through the
whole history of states. Additionally, we demonstrate that the proposed method
transfers to longer sequences significantly better than LSTMs trained with BPTT
and LSTMs trained with full self-attention.Comment: To appear as a Spotlight presentation at NIPS 201
- …