Search CORE

56,924 research outputs found

Fingerprint Policy Optimisation for Robust Reinforcement Learning

Author: Osborne Michael A.
Paul Supratik
Whiteson Shimon
Publication venue
Publication date: 27/05/2019
Field of study

Policy gradient methods ignore the potential value of adjusting environment variables: unobservable state features that are randomly determined by the environment in a physical setting, but are controllable in a simulator. This can lead to slow learning, or convergence to suboptimal policies, if the environment variable has a large impact on the transition dynamics. In this paper, we present fingerprint policy optimisation (FPO), which finds a policy that is optimal in expectation across the distribution of environment variables. The central idea is to use Bayesian optimisation (BO) to actively select the distribution of the environment variable that maximises the improvement generated by each iteration of the policy gradient method. To make this BO practical, we contribute two easy-to-compute low-dimensional fingerprints of the current policy. Our experiments show that FPO can efficiently learn policies that are robust to significant rare events, which are unlikely to be observable under random sampling, but are key to learning good policies.Comment: ICML 201

arXiv.org e-Print Archive

Oxford University Research Archive

Packing a Knapsack of Unknown Capacity

Author: Disser Yann
Klimm Max
Megow Nicole
Stiller Sebastian
Publication venue
Publication date: 10/07/2013
Field of study

We study the problem of packing a knapsack without knowing its capacity. Whenever we attempt to pack an item that does not fit, the item is discarded; if the item fits, we have to include it in the packing. We show that there is always a policy that packs a value within factor 2 of the optimum packing, irrespective of the actual capacity. If all items have unit density, we achieve a factor equal to the golden ratio. Both factors are shown to be best possible. In fact, we obtain the above factors using packing policies that are universal in the sense that they fix a particular order of the items and try to pack the items in this order, independent of the observations made while packing. We give efficient algorithms computing these policies. On the other hand, we show that, for any alpha>1, the problem of deciding whether a given universal policy achieves a factor of alpha is coNP-complete. If alpha is part of the input, the same problem is shown to be coNP-complete for items with unit densities. Finally, we show that it is coNP-hard to decide, for given alpha, whether a set of items admits a universal policy with factor alpha, even if all items have unit densities

arXiv.org e-Print Archive

TUbiblio

Dagstuhl Research Online Publication Server

MPG.PuRe

Evaluating the Impact of SDC on the GMRES Iterative Solver

Author: Elliott James
Hoemmen Mark
Mueller Frank
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 25/11/2013
Field of study

Increasing parallelism and transistor density, along with increasingly tighter energy and peak power constraints, may force exposure of occasionally incorrect computation or storage to application codes. Silent data corruption (SDC) will likely be infrequent, yet one SDC suffices to make numerical algorithms like iterative linear solvers cease progress towards the correct answer. Thus, we focus on resilience of the iterative linear solver GMRES to a single transient SDC. We derive inexpensive checks to detect the effects of an SDC in GMRES that work for a more general SDC model than presuming a bit flip. Our experiments show that when GMRES is used as the inner solver of an inner-outer iteration, it can "run through" SDC of almost any magnitude in the computationally intensive orthogonalization phase. That is, it gets the right answer using faulty data without any required roll back. Those SDCs which it cannot run through, get caught by our detection scheme

arXiv.org e-Print Archive

CiteSeerX

Crossref