Search CORE

74 research outputs found

Learning to soar: exploration strategies in reinforcement learning for resource-constrained missions

Author: Chung Jen Jen
Publication venue: Faculty of Engineering and Information Technologies, School of Aerospace, Mechanical and Mechatronic Engineering
Publication date: 01/01/2014
Field of study

An unpowered aerial glider learning to soar in a wind field presents a new manifestation of the exploration-exploitation trade-off. This thesis proposes a directed, adaptive and nonmyopic exploration strategy in a temporal difference reinforcement learning framework for tackling the resource-constrained exploration-exploitation task of this autonomous soaring problem. The complete learning algorithm is developed in a SARSA() framework, which uses a Gaussian process with a squared exponential covariance function to approximate the value function. The three key contributions of this thesis form the proposed exploration-exploitation strategy. Firstly, a new information measure is derived from the change in the variance volume surrounding the Gaussian process estimate. This measure of information gain is used to define the exploration reward of an observation. Secondly, a nonmyopic information value is presented that captures both the immediate exploration reward due to taking an action as well as future exploration opportunities that result. Finally, this information value is combined with the state-action value of SARSA() through a dynamic weighting factor to produce an exploration-exploitation management scheme for resource-constrained learning systems. The proposed learning strategy encourages either exploratory or exploitative behaviour depending on the requirements of the learning task and the available resources. The performance of the learning algorithms presented in this thesis is compared against other SARSA() methods. Results show that actively directing exploration to regions of the state-action space with high uncertainty improves the rate of learning, while dynamic management of the exploration-exploitation behaviour according to the available resources produces prudent learning behaviour in resource-constrained systems

Sydney eScholarship

Learning to soar: exploration strategies in reinforcement learning for resource-constrained missions

Author: Chung Jen Jen
Publication venue: Faculty of Engineering and Information Technologies, School of Aerospace, Mechanical and Mechatronic Engineering
Publication date: 01/01/2014
Field of study

Sydney eScholarship

Radboud Repository

Continuous-time spike-based reinforcement learning for working memory tasks

Author: A Gilra
D Silver
JO Rombouts
KN Gurney
PR Roelfsema
RS Sutton
S Hochreiter
TP Lillicrap
V Mnih
W Gerstner
Y Niv
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 04/10/2018
Field of study

As the brain purportedly employs on-policy reinforcement learning compatible with SARSA learning, and most interesting cognitive tasks require some form of memory while taking place in continuous-time, recent work has developed plausible reinforcement learning schemes that are compatible with these requirements. Lacking is a formulation of both computation and learning in terms of spiking neurons. Such a formulation creates both a closer mapping to biology, and also expresses such learning in terms of asynchronous and sparse neural computation. We present a spiking neural network with memory that learns cognitive tasks in continuous time. Learning is biologically plausibly implemented using the AuGMeNT framework, and we show how separate spiking forward and feedback networks suffice for learning the tasks just as fast the analog CT-AuGMeNT counterpart, while computing efficiently using very few spikes: 1–20 Hz on average

Crossref

CWI's Institutional Repository

Workshop on Rich Representations for Reinforcement Learning:Held in conjunction with the 22nd International Conference on Machine Learning, August 7, 2005, Bonn, Germany

Author: Driessens Kurt
Fern Alan
van Otterlo Martijn
Publication venue: University of Bonn
Publication date: 01/01/2005
Field of study

University of Twente Research Information

Representation discovery using a fixed basis in reinforcement learning

Author: Wookey Dean Stephen
Publication venue
Publication date: 01/01/2016
Field of study

A thesis presented for the degree of Doctor of Philosophy, School of Computer Science and Applied Mathematics. University of the Witwatersrand, South Africa. 26 August 2016.In the reinforcement learning paradigm, an agent learns by interacting with its environment. At each state, the agent receives a numerical reward. Its goal is to maximise the discounted sum of future rewards. One way it can do this is through learning a value function; a function which maps states to the discounted sum of future rewards. With an accurate value function and a model of the environment, the agent can take the optimal action in each state. In practice, however, the value function is approximated, and performance depends on the quality of the approximation. Linear function approximation is a commonly used approximation scheme, where the value function is represented as a weighted sum of basis functions or features. In continuous state environments, there are infinitely many such features to choose from, introducing the new problem of feature selection. Existing algorithms such as OMP-TD are slow to converge, scale poorly to high dimensional spaces, and have not been generalised to the online learning case. We introduce heuristic methods for reducing the search space in high dimensions that significantly reduce computational costs and also act as regularisers. We extend these methods and introduce feature regularisation for incremental feature selection in the batch learning case, and show that introducing a smoothness prior is effective with our SSOMP-TD and STOMP-TD algorithms. Finally we generalise OMP-TD and our algorithms to the online case and evaluate them empirically.LG201

Wits Institutional Repository on DSPACE