Search CORE

17,658 research outputs found

Finding online neural update rules by learning to remember

Author: Gregor Karol
Publication venue
Publication date: 06/03/2020
Field of study

We investigate learning of the online local update rules for neural activations (bodies) and weights (synapses) from scratch. We represent the states of each weight and activation by small vectors, and parameterize their updates using (meta-) neural networks. Different neuron types are represented by different embedding vectors which allows the same two functions to be used for all neurons. Instead of training directly for the objective using evolution or long term back-propagation, as is commonly done in similar systems, we motivate and study a different objective: That of remembering past snippets of experience. We explain how this objective relates to standard back-propagation training and other forms of learning. We train for this objective using short term back-propagation and analyze the performance as a function of both the different network types and the difficulty of the problem. We find that this analysis gives interesting insights onto what constitutes a learning rule. We also discuss how such system could form a natural substrate for addressing topics such as episodic memories, meta-learning and auxiliary objectives.Comment: 11 Pages, 1 figur

arXiv.org e-Print Archive

A Hebbian/Anti-Hebbian Neural Network for Linear Subspace Learning: A Derivation from Multidimensional Scaling of Streaming Data

Author: Chklovskii Dmitri B.
Hu Tao
Pehlevan Cengiz
Publication venue: 'MIT Press - Journals'
Publication date: 02/03/2015
Field of study

Neural network models of early sensory processing typically reduce the dimensionality of streaming input data. Such networks learn the principal subspace, in the sense of principal component analysis (PCA), by adjusting synaptic weights according to activity-dependent learning rules. When derived from a principled cost function these rules are nonlocal and hence biologically implausible. At the same time, biologically plausible local rules have been postulated rather than derived from a principled cost function. Here, to bridge this gap, we derive a biologically plausible network for subspace learning on streaming data by minimizing a principled cost function. In a departure from previous work, where cost was quantified by the representation, or reconstruction, error, we adopt a multidimensional scaling (MDS) cost function for streaming data. The resulting algorithm relies only on biologically plausible Hebbian and anti-Hebbian local learning rules. In a stochastic setting, synaptic weights converge to a stationary state which projects the input data onto the principal subspace. If the data are generated by a nonstationary distribution, the network can track the principal subspace. Thus, our result makes a step towards an algorithmic theory of neural computation.Comment: Accepted for publication in Neural Computatio

arXiv.org e-Print Archive

Integrating Transformer and Paraphrase Rules for Sentence Simplification

Author: Andi Saptono
Bambang Parmanto
He Daqing
Meng Rui
Zhao Sanqiang
Publication venue
Publication date: 26/10/2018
Field of study

Sentence simplification aims to reduce the complexity of a sentence while retaining its original meaning. Current models for sentence simplification adopted ideas from ma- chine translation studies and implicitly learned simplification mapping rules from normal- simple sentence pairs. In this paper, we explore a novel model based on a multi-layer and multi-head attention architecture and we pro- pose two innovative approaches to integrate the Simple PPDB (A Paraphrase Database for Simplification), an external paraphrase knowledge base for simplification that covers a wide range of real-world simplification rules. The experiments show that the integration provides two major benefits: (1) the integrated model outperforms multiple state- of-the-art baseline models for sentence simplification in the literature (2) through analysis of the rule utilization, the model seeks to select more accurate simplification rules. The code and models used in the paper are available at https://github.com/ Sanqiang/text_simplification

arXiv.org e-Print Archive

Recommended from our members

Ten simple rules for the computational modeling of behavioral data.

Author: Collins Anne
Wilson Robert
Publication venue: eScholarship, University of California
Publication date: 26/11/2019
Field of study

Computational modeling of behavior has revolutionized psychology and neuroscience. By fitting models to experimental data we can probe the algorithms underlying behavior, find neural correlates of computational variables and better understand the effects of drugs, illness and interventions. But with great power comes great responsibility. Here, we offer ten simple rules to ensure that computational modeling is used with care and yields meaningful insights. In particular, we present a beginner-friendly, pragmatic and details-oriented introduction on how to relate models to data. What, exactly, can a model tell us about the mind? To answer this, we apply our rules to the simplest modeling techniques most accessible to beginning modelers and illustrate them with examples and code available online. However, most rules apply to more advanced techniques. Our hope is that by following our guidelines, researchers will avoid many pitfalls and unleash the power of computational modeling on their own data

eScholarship - University of California

Evolving Indoor Navigational Strategies Using Gated Recurrent Units In NEAT

Author: Butterworth James
Savani Rahul
Tuyls Karl
Publication venue
Publication date: 12/04/2019
Field of study

Simultaneous Localisation and Mapping (SLAM) algorithms are expensive to run on smaller robotic platforms such as Micro-Aerial Vehicles. Bug algorithms are an alternative that use relatively little processing power, and avoid high memory consumption by not building an explicit map of the environment. Bug Algorithms achieve relatively good performance in simulated and robotic maze solving domains. However, because they are hand-designed, a natural question is whether they are globally optimal control policies. In this work we explore the performance of Neuroevolution - specifically NEAT - at evolving control policies for simulated differential drive robots carrying out generalised maze navigation. We extend NEAT to include Gated Recurrent Units (GRUs) to help deal with long term dependencies. We show that both NEAT and our NEAT-GRU can repeatably generate controllers that outperform I-Bug (an algorithm particularly well-suited for use in real robots) on a test set of 209 indoor maze like environments. We show that NEAT-GRU is superior to NEAT in this task but also that out of the 2 systems, only NEAT-GRU can continuously evolve successful controllers for a much harder task in which no bearing information about the target is provided to the agent

arXiv.org e-Print Archive

Contextual Memory Trees

Author: Beygelzimer Alina
Daumé III Hal
Langford John
Mineiro Paul
Sun Wen
Publication venue
Publication date: 02/06/2019
Field of study

We design and study a Contextual Memory Tree (CMT), a learning memory controller that inserts new memories into an experience store of unbounded size. It is designed to efficiently query for memories from that store, supporting logarithmic time insertion and retrieval operations. Hence CMT can be integrated into existing statistical learning algorithms as an augmented memory unit without substantially increasing training and inference computation. Furthermore CMT operates as a reduction to classification, allowing it to benefit from advances in representation or architecture. We demonstrate the efficacy of CMT by augmenting existing multi-class and multi-label classification algorithms with CMT and observe statistical improvement. We also test CMT learning on several image-captioning tasks to demonstrate that it performs computationally better than a simple nearest neighbors memory system while benefitting from reward learning.Comment: ICM 201

arXiv.org e-Print Archive

Unsupervised Predictive Memory in a Goal-Directed Agent

Author: Abramson Josh
Ahuja Arun
Amos David
Botvinick Matt
Cain Adam
Gemici Mevlana
Grabska-Barwinska Agnieszka
Harley Tim
Hassabis Demis
Hillier Chloe
Hung Chia-Chun
Kavukcuoglu Koray
Leibo Joel Z.
Lillicrap Timothy
Mirowski Piotr
Mirza Mehdi
Mohamed Shakir
Rae Jack
Reynolds Malcolm
Rezende Danilo
Santoro Adam
Saxton David
Silver David
Wayne Greg
Publication venue
Publication date: 28/03/2018
Field of study

Animals execute goal-directed behaviours despite the limited range and scope of their sensors. To cope, they explore environments and store memories maintaining estimates of important information that is not presently available. Recently, progress has been made with artificial intelligence (AI) agents that learn to perform tasks from sensory input, even at a human level, by merging reinforcement learning (RL) algorithms with deep neural networks, and the excitement surrounding these results has led to the pursuit of related ideas as explanations of non-human animal learning. However, we demonstrate that contemporary RL algorithms struggle to solve simple tasks when enough information is concealed from the sensors of the agent, a property called "partial observability". An obvious requirement for handling partially observed tasks is access to extensive memory, but we show memory is not enough; it is critical that the right information be stored in the right format. We develop a model, the Memory, RL, and Inference Network (MERLIN), in which memory formation is guided by a process of predictive modeling. MERLIN facilitates the solution of tasks in 3D virtual reality environments for which partial observability is severe and memories must be maintained over long durations. Our model demonstrates a single learning agent architecture that can solve canonical behavioural tasks in psychology and neurobiology without strong simplifying assumptions about the dimensionality of sensory input or the duration of experiences

arXiv.org e-Print Archive

Balancing New Against Old Information: The Role of Surprise in Learning

Author: Faraji Mohammadjavad
Gerstner Wulfram
Preuschoff Kerstin
Publication venue
Publication date: 01/03/2017
Field of study

Surprise describes a range of phenomena from unexpected events to behavioral responses. We propose a measure of surprise and use it for surprise-driven learning. Our surprise measure takes into account data likelihood as well as the degree of commitment to a belief via the entropy of the belief distribution. We find that surprise-minimizing learning dynamically adjusts the balance between new and old information without the need of knowledge about the temporal statistics of the environment. We apply our framework to a dynamic decision-making task and a maze exploration task. Our surprise minimizing framework is suitable for learning in complex environments, even if the environment undergoes gradual or sudden changes and could eventually provide a framework to study the behavior of humans and animals encountering surprising events

arXiv.org e-Print Archive

A Graph-to-Sequence Model for AMR-to-Text Generation

Author: Gildea Daniel
Song Linfeng
Wang Zhiguo
Zhang Yue
Publication venue
Publication date: 27/08/2018
Field of study

The problem of AMR-to-text generation is to recover a text representing the same meaning as an input AMR graph. The current state-of-the-art method uses a sequence-to-sequence model, leveraging LSTM for encoding a linearized AMR structure. Although being able to model non-local semantic information, a sequence LSTM can lose information from the AMR graph structure, and thus faces challenges with large graphs, which result in long sequences. We introduce a neural graph-to-sequence model, using a novel LSTM structure for directly encoding graph-level semantics. On a standard benchmark, our model shows superior results to existing methods in the literature.Comment: ACL 2018 camera-ready, Proceedings of ACL 2018 with updated performanc

arXiv.org e-Print Archive

Neural Machine Translation and Sequence-to-sequence Models: A Tutorial

Author: Neubig Graham
Publication venue
Publication date: 05/03/2017
Field of study

This tutorial introduces a new and powerful set of techniques variously called "neural machine translation" or "neural sequence-to-sequence models". These techniques have been used in a number of tasks regarding the handling of human language, and can be a powerful tool in the toolbox of anyone who wants to model sequential data of some sort. The tutorial assumes that the reader knows the basics of math and programming, but does not assume any particular experience with neural networks or natural language processing. It attempts to explain the intuition behind the various methods covered, then delves into them with enough mathematical detail to understand them concretely, and culiminates with a suggestion for an implementation exercise, where readers can test that they understood the content in practice.Comment: 65 Page

arXiv.org e-Print Archive