48,382 research outputs found
Defense Against Reward Poisoning Attacks in Reinforcement Learning
We study defense strategies against reward poisoning attacks in reinforcement learning. As a threat model, we consider attacks that minimally alter rewards to make the attacker's target policy uniquely optimal under the poisoned rewards, with the optimality gap specified by an attack parameter. Our goal is to design agents that are robust against such attacks in terms of the worst-case utility w.r.t. the true, unpoisoned, rewards while computing their policies under the poisoned rewards. We propose an optimization framework for deriving optimal defense policies, both when the attack parameter is known and unknown. Moreover, we show that defense policies that are solutions to the proposed optimization problems have provable performance guarantees. In particular, we provide the following bounds with respect to the true, unpoisoned, rewards: a) lower bounds on the expected return of the defense policies, and b) upper bounds on how suboptimal these defense policies are compared to the attacker's target policy. We conclude the paper by illustrating the intuitions behind our formal results, and showing that the derived bounds are non-trivial
Provably Safe Robot Navigation with Obstacle Uncertainty
As drones and autonomous cars become more widespread it is becoming
increasingly important that robots can operate safely under realistic
conditions. The noisy information fed into real systems means that robots must
use estimates of the environment to plan navigation. Efficiently guaranteeing
that the resulting motion plans are safe under these circumstances has proved
difficult. We examine how to guarantee that a trajectory or policy is safe with
only imperfect observations of the environment. We examine the implications of
various mathematical formalisms of safety and arrive at a mathematical notion
of safety of a long-term execution, even when conditioned on observational
information. We present efficient algorithms that can prove that trajectories
or policies are safe with much tighter bounds than in previous work. Notably,
the complexity of the environment does not affect our methods ability to
evaluate if a trajectory or policy is safe. We then use these safety checking
methods to design a safe variant of the RRT planning algorithm.Comment: RSS 201
Min Max Generalization for Two-stage Deterministic Batch Mode Reinforcement Learning: Relaxation Schemes
We study the minmax optimization problem introduced in [22] for computing
policies for batch mode reinforcement learning in a deterministic setting.
First, we show that this problem is NP-hard. In the two-stage case, we provide
two relaxation schemes. The first relaxation scheme works by dropping some
constraints in order to obtain a problem that is solvable in polynomial time.
The second relaxation scheme, based on a Lagrangian relaxation where all
constraints are dualized, leads to a conic quadratic programming problem. We
also theoretically prove and empirically illustrate that both relaxation
schemes provide better results than those given in [22]
- …