4,689 research outputs found
Sparse Attentive Backtracking: Temporal CreditAssignment Through Reminding
Learning long-term dependencies in extended temporal sequences requires
credit assignment to events far back in the past. The most common method for
training recurrent neural networks, back-propagation through time (BPTT),
requires credit information to be propagated backwards through every single
step of the forward computation, potentially over thousands or millions of time
steps. This becomes computationally expensive or even infeasible when used with
long sequences. Importantly, biological brains are unlikely to perform such
detailed reverse replay over very long sequences of internal states (consider
days, months, or years.) However, humans are often reminded of past memories or
mental states which are associated with the current mental state. We consider
the hypothesis that such memory associations between past and present could be
used for credit assignment through arbitrarily long sequences, propagating the
credit assigned to the current state to the associated past state. Based on
this principle, we study a novel algorithm which only back-propagates through a
few of these temporal skip connections, realized by a learned attention
mechanism that associates current states with relevant past states. We
demonstrate in experiments that our method matches or outperforms regular BPTT
and truncated BPTT in tasks involving particularly long-term dependencies, but
without requiring the biologically implausible backward replay through the
whole history of states. Additionally, we demonstrate that the proposed method
transfers to longer sequences significantly better than LSTMs trained with BPTT
and LSTMs trained with full self-attention.Comment: To appear as a Spotlight presentation at NIPS 201
Neural Distributed Autoassociative Memories: A Survey
Introduction. Neural network models of autoassociative, distributed memory
allow storage and retrieval of many items (vectors) where the number of stored
items can exceed the vector dimension (the number of neurons in the network).
This opens the possibility of a sublinear time search (in the number of stored
items) for approximate nearest neighbors among vectors of high dimension. The
purpose of this paper is to review models of autoassociative, distributed
memory that can be naturally implemented by neural networks (mainly with local
learning rules and iterative dynamics based on information locally available to
neurons). Scope. The survey is focused mainly on the networks of Hopfield,
Willshaw and Potts, that have connections between pairs of neurons and operate
on sparse binary vectors. We discuss not only autoassociative memory, but also
the generalization properties of these networks. We also consider neural
networks with higher-order connections and networks with a bipartite graph
structure for non-binary data with linear constraints. Conclusions. In
conclusion we discuss the relations to similarity search, advantages and
drawbacks of these techniques, and topics for further research. An interesting
and still not completely resolved question is whether neural autoassociative
memories can search for approximate nearest neighbors faster than other index
structures for similarity search, in particular for the case of very high
dimensional vectors.Comment: 31 page
- …