17,658 research outputs found
Finding online neural update rules by learning to remember
We investigate learning of the online local update rules for neural
activations (bodies) and weights (synapses) from scratch. We represent the
states of each weight and activation by small vectors, and parameterize their
updates using (meta-) neural networks. Different neuron types are represented
by different embedding vectors which allows the same two functions to be used
for all neurons. Instead of training directly for the objective using evolution
or long term back-propagation, as is commonly done in similar systems, we
motivate and study a different objective: That of remembering past snippets of
experience. We explain how this objective relates to standard back-propagation
training and other forms of learning. We train for this objective using short
term back-propagation and analyze the performance as a function of both the
different network types and the difficulty of the problem. We find that this
analysis gives interesting insights onto what constitutes a learning rule. We
also discuss how such system could form a natural substrate for addressing
topics such as episodic memories, meta-learning and auxiliary objectives.Comment: 11 Pages, 1 figur
A Hebbian/Anti-Hebbian Neural Network for Linear Subspace Learning: A Derivation from Multidimensional Scaling of Streaming Data
Neural network models of early sensory processing typically reduce the
dimensionality of streaming input data. Such networks learn the principal
subspace, in the sense of principal component analysis (PCA), by adjusting
synaptic weights according to activity-dependent learning rules. When derived
from a principled cost function these rules are nonlocal and hence biologically
implausible. At the same time, biologically plausible local rules have been
postulated rather than derived from a principled cost function. Here, to bridge
this gap, we derive a biologically plausible network for subspace learning on
streaming data by minimizing a principled cost function. In a departure from
previous work, where cost was quantified by the representation, or
reconstruction, error, we adopt a multidimensional scaling (MDS) cost function
for streaming data. The resulting algorithm relies only on biologically
plausible Hebbian and anti-Hebbian local learning rules. In a stochastic
setting, synaptic weights converge to a stationary state which projects the
input data onto the principal subspace. If the data are generated by a
nonstationary distribution, the network can track the principal subspace. Thus,
our result makes a step towards an algorithmic theory of neural computation.Comment: Accepted for publication in Neural Computatio
Integrating Transformer and Paraphrase Rules for Sentence Simplification
Sentence simplification aims to reduce the complexity of a sentence while
retaining its original meaning. Current models for sentence simplification
adopted ideas from ma- chine translation studies and implicitly learned
simplification mapping rules from normal- simple sentence pairs. In this paper,
we explore a novel model based on a multi-layer and multi-head attention
architecture and we pro- pose two innovative approaches to integrate the Simple
PPDB (A Paraphrase Database for Simplification), an external paraphrase
knowledge base for simplification that covers a wide range of real-world
simplification rules. The experiments show that the integration provides two
major benefits: (1) the integrated model outperforms multiple state- of-the-art
baseline models for sentence simplification in the literature (2) through
analysis of the rule utilization, the model seeks to select more accurate
simplification rules. The code and models used in the paper are available at
https://github.com/ Sanqiang/text_simplification
Recommended from our members
Ten simple rules for the computational modeling of behavioral data.
Computational modeling of behavior has revolutionized psychology and neuroscience. By fitting models to experimental data we can probe the algorithms underlying behavior, find neural correlates of computational variables and better understand the effects of drugs, illness and interventions. But with great power comes great responsibility. Here, we offer ten simple rules to ensure that computational modeling is used with care and yields meaningful insights. In particular, we present a beginner-friendly, pragmatic and details-oriented introduction on how to relate models to data. What, exactly, can a model tell us about the mind? To answer this, we apply our rules to the simplest modeling techniques most accessible to beginning modelers and illustrate them with examples and code available online. However, most rules apply to more advanced techniques. Our hope is that by following our guidelines, researchers will avoid many pitfalls and unleash the power of computational modeling on their own data
Evolving Indoor Navigational Strategies Using Gated Recurrent Units In NEAT
Simultaneous Localisation and Mapping (SLAM) algorithms are expensive to run
on smaller robotic platforms such as Micro-Aerial Vehicles. Bug algorithms are
an alternative that use relatively little processing power, and avoid high
memory consumption by not building an explicit map of the environment. Bug
Algorithms achieve relatively good performance in simulated and robotic maze
solving domains. However, because they are hand-designed, a natural question is
whether they are globally optimal control policies. In this work we explore the
performance of Neuroevolution - specifically NEAT - at evolving control
policies for simulated differential drive robots carrying out generalised maze
navigation. We extend NEAT to include Gated Recurrent Units (GRUs) to help deal
with long term dependencies. We show that both NEAT and our NEAT-GRU can
repeatably generate controllers that outperform I-Bug (an algorithm
particularly well-suited for use in real robots) on a test set of 209 indoor
maze like environments. We show that NEAT-GRU is superior to NEAT in this task
but also that out of the 2 systems, only NEAT-GRU can continuously evolve
successful controllers for a much harder task in which no bearing information
about the target is provided to the agent
Contextual Memory Trees
We design and study a Contextual Memory Tree (CMT), a learning memory
controller that inserts new memories into an experience store of unbounded
size. It is designed to efficiently query for memories from that store,
supporting logarithmic time insertion and retrieval operations. Hence CMT can
be integrated into existing statistical learning algorithms as an augmented
memory unit without substantially increasing training and inference
computation. Furthermore CMT operates as a reduction to classification,
allowing it to benefit from advances in representation or architecture. We
demonstrate the efficacy of CMT by augmenting existing multi-class and
multi-label classification algorithms with CMT and observe statistical
improvement. We also test CMT learning on several image-captioning tasks to
demonstrate that it performs computationally better than a simple nearest
neighbors memory system while benefitting from reward learning.Comment: ICM 201
Unsupervised Predictive Memory in a Goal-Directed Agent
Animals execute goal-directed behaviours despite the limited range and scope
of their sensors. To cope, they explore environments and store memories
maintaining estimates of important information that is not presently available.
Recently, progress has been made with artificial intelligence (AI) agents that
learn to perform tasks from sensory input, even at a human level, by merging
reinforcement learning (RL) algorithms with deep neural networks, and the
excitement surrounding these results has led to the pursuit of related ideas as
explanations of non-human animal learning. However, we demonstrate that
contemporary RL algorithms struggle to solve simple tasks when enough
information is concealed from the sensors of the agent, a property called
"partial observability". An obvious requirement for handling partially observed
tasks is access to extensive memory, but we show memory is not enough; it is
critical that the right information be stored in the right format. We develop a
model, the Memory, RL, and Inference Network (MERLIN), in which memory
formation is guided by a process of predictive modeling. MERLIN facilitates the
solution of tasks in 3D virtual reality environments for which partial
observability is severe and memories must be maintained over long durations.
Our model demonstrates a single learning agent architecture that can solve
canonical behavioural tasks in psychology and neurobiology without strong
simplifying assumptions about the dimensionality of sensory input or the
duration of experiences
Balancing New Against Old Information: The Role of Surprise in Learning
Surprise describes a range of phenomena from unexpected events to behavioral
responses. We propose a measure of surprise and use it for surprise-driven
learning. Our surprise measure takes into account data likelihood as well as
the degree of commitment to a belief via the entropy of the belief
distribution. We find that surprise-minimizing learning dynamically adjusts the
balance between new and old information without the need of knowledge about the
temporal statistics of the environment. We apply our framework to a dynamic
decision-making task and a maze exploration task. Our surprise minimizing
framework is suitable for learning in complex environments, even if the
environment undergoes gradual or sudden changes and could eventually provide a
framework to study the behavior of humans and animals encountering surprising
events
A Graph-to-Sequence Model for AMR-to-Text Generation
The problem of AMR-to-text generation is to recover a text representing the
same meaning as an input AMR graph. The current state-of-the-art method uses a
sequence-to-sequence model, leveraging LSTM for encoding a linearized AMR
structure. Although being able to model non-local semantic information, a
sequence LSTM can lose information from the AMR graph structure, and thus faces
challenges with large graphs, which result in long sequences. We introduce a
neural graph-to-sequence model, using a novel LSTM structure for directly
encoding graph-level semantics. On a standard benchmark, our model shows
superior results to existing methods in the literature.Comment: ACL 2018 camera-ready, Proceedings of ACL 2018 with updated
performanc
Neural Machine Translation and Sequence-to-sequence Models: A Tutorial
This tutorial introduces a new and powerful set of techniques variously
called "neural machine translation" or "neural sequence-to-sequence models".
These techniques have been used in a number of tasks regarding the handling of
human language, and can be a powerful tool in the toolbox of anyone who wants
to model sequential data of some sort. The tutorial assumes that the reader
knows the basics of math and programming, but does not assume any particular
experience with neural networks or natural language processing. It attempts to
explain the intuition behind the various methods covered, then delves into them
with enough mathematical detail to understand them concretely, and culiminates
with a suggestion for an implementation exercise, where readers can test that
they understood the content in practice.Comment: 65 Page
- …