Search CORE

65,955 research outputs found

On the Memory Properties of Recurrent Neural Models

Author: Benetos E
Garcez AD
IEEE
Russell AJ
Publication venue
Publication date: 15/02/2017
Field of study

Queen Mary Research Online

Recommended from our members

On the Memory Properties of Recurrent Neural Models

Author: Benetos E.
Garcez A.
Russell A. J.
Publication venue
Publication date: 01/01/2017
Field of study

In this paper, we investigate the memory properties of two popular gated units: long short term memory (LSTM) and gated recurrent units (GRU), which have been used in recurrent neural networks (RNN) to achieve state-of-the-art performance on several machine learning tasks. We propose five basic tasks for isolating and examining specific capabilities relating to the implementation of memory. Results show that (i) both types of gated unit perform less reliably than standard RNN units on tasks testing fixed delay recall, (ii) the reliability of stochastic gradient descent decreases as network complexity increases, and (iii) gated units are found to perform better than standard RNNs on tasks that require values to be stored in memory and updated conditionally upon input to the network. Task performance is found to be surprisingly independent of network depth (number of layers) and connection architecture. Finally, visualisations of the solutions found by these networks are presented and explored, exposing for the first time how logic operations are implemented by individual gated cells and small groups of these cells

City Research Online

Scaling Recurrent Neural Network Language Models

Author: Ash Tom
Mrva David
Prasad Niranjani
Robinson Tony
Williams Will
Publication venue
Publication date: 02/02/2015
Field of study

This paper investigates the scaling properties of Recurrent Neural Network Language Models (RNNLMs). We discuss how to train very large RNNs on GPUs and address the questions of how RNNLMs scale with respect to model size, training-set size, computational costs and memory. Our analysis shows that despite being more costly to train, RNNLMs obtain much lower perplexities on standard benchmarks than n-gram models. We train the largest known RNNs and present relative word error rates gains of 18% on an ASR task. We also present the new lowest perplexities on the recently released billion word language modelling benchmark, 1 BLEU point gain on machine translation and a 17% relative hit rate gain in word prediction

arXiv.org e-Print Archive

Crossref

Deep Complex Networks

Author: Bengio Yoshua
Bilaniuk Olexa
Mehri Soroush
Pal Christopher J
Rostamzadeh Negar
Santos João Felipe
Serdyuk Dmitriy
Subramanian Sandeep
Trabelsi Chiheb
Zhang Ying
Publication venue
Publication date: 01/01/2018
Field of study

At present, the vast majority of building blocks, techniques, and architectures for deep learning are based on real-valued operations and representations. However, recent work on recurrent neural networks and older fundamental theoretical analysis suggests that complex numbers could have a richer representational capacity and could also facilitate noise-robust memory retrieval mechanisms. Despite their attractive properties and potential for opening up entirely new neural architectures, complex-valued deep neural networks have been marginalized due to the absence of the building blocks required to design such models. In this work, we provide the key atomic components for complex-valued deep neural networks and apply them to convolutional feed-forward networks and convolutional LSTMs. More precisely, we rely on complex convolutions and present algorithms for complex batch-normalization, complex weight initialization strategies for complex-valued neural nets and we use them in experiments with end-to-end training schemes. We demonstrate that such complex-valued models are competitive with their real-valued counterparts. We test deep complex models on several computer vision tasks, on music transcription using the MusicNet dataset and on Speech Spectrum Prediction using the TIMIT dataset. We achieve state-of-the-art performance on these audio-related tasks

arXiv.org e-Print Archive

PolyPublie