48,382 research outputs found

    Defense Against Reward Poisoning Attacks in Reinforcement Learning

    Get PDF
    We study defense strategies against reward poisoning attacks in reinforcement learning. As a threat model, we consider attacks that minimally alter rewards to make the attacker's target policy uniquely optimal under the poisoned rewards, with the optimality gap specified by an attack parameter. Our goal is to design agents that are robust against such attacks in terms of the worst-case utility w.r.t. the true, unpoisoned, rewards while computing their policies under the poisoned rewards. We propose an optimization framework for deriving optimal defense policies, both when the attack parameter is known and unknown. Moreover, we show that defense policies that are solutions to the proposed optimization problems have provable performance guarantees. In particular, we provide the following bounds with respect to the true, unpoisoned, rewards: a) lower bounds on the expected return of the defense policies, and b) upper bounds on how suboptimal these defense policies are compared to the attacker's target policy. We conclude the paper by illustrating the intuitions behind our formal results, and showing that the derived bounds are non-trivial

    Provably Safe Robot Navigation with Obstacle Uncertainty

    Full text link
    As drones and autonomous cars become more widespread it is becoming increasingly important that robots can operate safely under realistic conditions. The noisy information fed into real systems means that robots must use estimates of the environment to plan navigation. Efficiently guaranteeing that the resulting motion plans are safe under these circumstances has proved difficult. We examine how to guarantee that a trajectory or policy is safe with only imperfect observations of the environment. We examine the implications of various mathematical formalisms of safety and arrive at a mathematical notion of safety of a long-term execution, even when conditioned on observational information. We present efficient algorithms that can prove that trajectories or policies are safe with much tighter bounds than in previous work. Notably, the complexity of the environment does not affect our methods ability to evaluate if a trajectory or policy is safe. We then use these safety checking methods to design a safe variant of the RRT planning algorithm.Comment: RSS 201

    Min Max Generalization for Two-stage Deterministic Batch Mode Reinforcement Learning: Relaxation Schemes

    Full text link
    We study the minmax optimization problem introduced in [22] for computing policies for batch mode reinforcement learning in a deterministic setting. First, we show that this problem is NP-hard. In the two-stage case, we provide two relaxation schemes. The first relaxation scheme works by dropping some constraints in order to obtain a problem that is solvable in polynomial time. The second relaxation scheme, based on a Lagrangian relaxation where all constraints are dualized, leads to a conic quadratic programming problem. We also theoretically prove and empirically illustrate that both relaxation schemes provide better results than those given in [22]
    • …
    corecore