56,924 research outputs found
Fingerprint Policy Optimisation for Robust Reinforcement Learning
Policy gradient methods ignore the potential value of adjusting environment
variables: unobservable state features that are randomly determined by the
environment in a physical setting, but are controllable in a simulator. This
can lead to slow learning, or convergence to suboptimal policies, if the
environment variable has a large impact on the transition dynamics. In this
paper, we present fingerprint policy optimisation (FPO), which finds a policy
that is optimal in expectation across the distribution of environment
variables. The central idea is to use Bayesian optimisation (BO) to actively
select the distribution of the environment variable that maximises the
improvement generated by each iteration of the policy gradient method. To make
this BO practical, we contribute two easy-to-compute low-dimensional
fingerprints of the current policy. Our experiments show that FPO can
efficiently learn policies that are robust to significant rare events, which
are unlikely to be observable under random sampling, but are key to learning
good policies.Comment: ICML 201
Packing a Knapsack of Unknown Capacity
We study the problem of packing a knapsack without knowing its capacity.
Whenever we attempt to pack an item that does not fit, the item is discarded;
if the item fits, we have to include it in the packing. We show that there is
always a policy that packs a value within factor 2 of the optimum packing,
irrespective of the actual capacity. If all items have unit density, we achieve
a factor equal to the golden ratio. Both factors are shown to be best possible.
In fact, we obtain the above factors using packing policies that are universal
in the sense that they fix a particular order of the items and try to pack the
items in this order, independent of the observations made while packing. We
give efficient algorithms computing these policies. On the other hand, we show
that, for any alpha>1, the problem of deciding whether a given universal policy
achieves a factor of alpha is coNP-complete. If alpha is part of the input, the
same problem is shown to be coNP-complete for items with unit densities.
Finally, we show that it is coNP-hard to decide, for given alpha, whether a set
of items admits a universal policy with factor alpha, even if all items have
unit densities
Evaluating the Impact of SDC on the GMRES Iterative Solver
Increasing parallelism and transistor density, along with increasingly
tighter energy and peak power constraints, may force exposure of occasionally
incorrect computation or storage to application codes. Silent data corruption
(SDC) will likely be infrequent, yet one SDC suffices to make numerical
algorithms like iterative linear solvers cease progress towards the correct
answer. Thus, we focus on resilience of the iterative linear solver GMRES to a
single transient SDC. We derive inexpensive checks to detect the effects of an
SDC in GMRES that work for a more general SDC model than presuming a bit flip.
Our experiments show that when GMRES is used as the inner solver of an
inner-outer iteration, it can "run through" SDC of almost any magnitude in the
computationally intensive orthogonalization phase. That is, it gets the right
answer using faulty data without any required roll back. Those SDCs which it
cannot run through, get caught by our detection scheme
- …