Learning to soar: exploration strategies in reinforcement learning for resource-constrained missions

Chung, Jen Jen

unknown

Learning to soar: exploration strategies in reinforcement learning for resource-constrained missions

Authors: Jen Jen Chung
Publication date: 1 January 2014
Publisher: Faculty of Engineering and Information Technologies, School of Aerospace, Mechanical and Mechatronic Engineering

Abstract

An unpowered aerial glider learning to soar in a wind field presents a new manifestation of the exploration-exploitation trade-off. This thesis proposes a directed, adaptive and nonmyopic exploration strategy in a temporal difference reinforcement learning framework for tackling the resource-constrained exploration-exploitation task of this autonomous soaring problem. The complete learning algorithm is developed in a SARSA() framework, which uses a Gaussian process with a squared exponential covariance function to approximate the value function. The three key contributions of this thesis form the proposed exploration-exploitation strategy. Firstly, a new information measure is derived from the change in the variance volume surrounding the Gaussian process estimate. This measure of information gain is used to define the exploration reward of an observation. Secondly, a nonmyopic information value is presented that captures both the immediate exploration reward due to taking an action as well as future exploration opportunities that result. Finally, this information value is combined with the state-action value of SARSA() through a dynamic weighting factor to produce an exploration-exploitation management scheme for resource-constrained learning systems. The proposed learning strategy encourages either exploratory or exploitative behaviour depending on the requirements of the learning task and the available resources. The performance of the learning algorithms presented in this thesis is compared against other SARSA() methods. Results show that actively directing exploration to regions of the state-action space with high uncertainty improves the rate of learning, while dynamic management of the exploration-exploitation behaviour according to the available resources produces prudent learning behaviour in resource-constrained systems

Similar works

Full text

Open in the Core reader

Download PDF

Available Versions

Sydney eScholarship

oai:ses.library.usyd.edu.au:21...

Last time updated on 16/06/2016