Search CORE

12,967 research outputs found

Fingerprint Policy Optimisation for Robust Reinforcement Learning

Author: Osborne Michael A.
Paul Supratik
Whiteson Shimon
Publication venue
Publication date: 27/05/2019
Field of study

Policy gradient methods ignore the potential value of adjusting environment variables: unobservable state features that are randomly determined by the environment in a physical setting, but are controllable in a simulator. This can lead to slow learning, or convergence to suboptimal policies, if the environment variable has a large impact on the transition dynamics. In this paper, we present fingerprint policy optimisation (FPO), which finds a policy that is optimal in expectation across the distribution of environment variables. The central idea is to use Bayesian optimisation (BO) to actively select the distribution of the environment variable that maximises the improvement generated by each iteration of the policy gradient method. To make this BO practical, we contribute two easy-to-compute low-dimensional fingerprints of the current policy. Our experiments show that FPO can efficiently learn policies that are robust to significant rare events, which are unlikely to be observable under random sampling, but are key to learning good policies.Comment: ICML 201

arXiv.org e-Print Archive

Oxford University Research Archive

An Entropy Search Portfolio for Bayesian Optimization

Author: Bouchard-Côté Alexandre
de Freitas Nando
Hoffman Matthew W.
Shahriari Bobak
Wang Ziyu
Publication venue
Publication date: 01/01/2014
Field of study

Bayesian optimization is a sample-efficient method for black-box global optimization. How- ever, the performance of a Bayesian optimization method very much depends on its exploration strategy, i.e. the choice of acquisition function, and it is not clear a priori which choice will result in superior performance. While portfolio methods provide an effective, principled way of combining a collection of acquisition functions, they are often based on measures of past performance which can be misleading. To address this issue, we introduce the Entropy Search Portfolio (ESP): a novel approach to portfolio construction which is motivated by information theoretic considerations. We show that ESP outperforms existing portfolio methods on several real and synthetic problems, including geostatistical datasets and simulated control tasks. We not only show that ESP is able to offer performance as good as the best, but unknown, acquisition function, but surprisingly it often gives better performance. Finally, over a wide range of conditions we find that ESP is robust to the inclusion of poor acquisition functions.Comment: 10 pages, 5 figure

arXiv.org e-Print Archive

CiteSeerX

Oxford University Research Archive

The approximate coordinate exchange algorithm for Bayesian optimal design of experiments

Author: Overstall Antony
Woods David
Publication venue: 'University of Glasgow'
Publication date: 01/01/2015
Field of study

Optimal Bayesian experimental design typically involves maximising the expectation, with respect to the joint distribution of parameters and responses, of some appropriately chosen utility function. This objective function is usually not available in closed form and the design space can be of high dimensionality. The approximate coordinate exchange algorithm is proposed for this maximisation problem where a Gaussian process emulator is used to approximate the objective function. The algorithm can be used for arbitrary utility functions meaning we can consider fully Bayesian optimal design. It can also be used for those utility functions that result in pseudo-Bayesian designs such as the popular Bayesian D-optimality. The algorithm is demonstrated on a range of examples

Enlighten