134 research outputs found
TempLe: Learning Template of Transitions for Sample Efficient Multi-task RL
Transferring knowledge among various environments is important to efficiently
learn multiple tasks online. Most existing methods directly use the previously
learned models or previously learned optimal policies to learn new tasks.
However, these methods may be inefficient when the underlying models or optimal
policies are substantially different across tasks. In this paper, we propose
Template Learning (TempLe), the first PAC-MDP method for multi-task
reinforcement learning that could be applied to tasks with varying state/action
space. TempLe generates transition dynamics templates, abstractions of the
transition dynamics across tasks, to gain sample efficiency by extracting
similarities between tasks even when their underlying models or optimal
policies have limited commonalities. We present two algorithms for an "online"
and a "finite-model" setting respectively. We prove that our proposed TempLe
algorithms achieve much lower sample complexity than single-task learners or
state-of-the-art multi-task methods. We show via systematically designed
experiments that our TempLe method universally outperforms the state-of-the-art
multi-task methods (PAC-MDP or not) in various settings and regimes
Safe and Robust Multi-Agent Reinforcement Learning for Connected Autonomous Vehicles under State Perturbations
Sensing and communication technologies have enhanced learning-based decision
making methodologies for multi-agent systems such as connected autonomous
vehicles (CAV). However, most existing safe reinforcement learning based
methods assume accurate state information. It remains challenging to achieve
safety requirement under state uncertainties for CAVs, considering the noisy
sensor measurements and the vulnerability of communication channels. In this
work, we propose a Robust Multi-Agent Proximal Policy Optimization with robust
Safety Shield (SR-MAPPO) for CAVs in various driving scenarios. Both robust
MARL algorithm and control barrier function (CBF)-based safety shield are used
in our approach to cope with the perturbed or uncertain state inputs. The
robust policy is trained with a worst-case Q function regularization module
that pursues higher lower-bounded reward in the former, whereas the latter,
i.e., the robust CBF safety shield accounts for CAVs' collision-free
constraints in complicated driving scenarios with even perturbed vehicle state
information. We validate the advantages of SR-MAPPO in robustness and safety
and compare it with baselines under different driving and state perturbation
scenarios in CARLA simulator. The SR-MAPPO policy is verified to maintain
higher safety rates and efficiency (reward) when threatened by both state
perturbations and unconnected vehicles' dangerous behaviors.Comment: 6 pages, 5 figure
Rethinking Adversarial Policies: A Generalized Attack Formulation and Provable Defense in Multi-Agent RL
Most existing works consider direct perturbations of victim's state/action or
the underlying transition dynamics to show vulnerability of reinforcement
learning agents under adversarial attacks. However, such direct manipulation
may not always be feasible in practice. In this paper, we consider another
common and realistic attack setup: in a multi-agent RL setting with
well-trained agents, during deployment time, the victim agent is
exploited by an attacker who controls another agent to act
adversarially against the victim using an \textit{adversarial policy}. Prior
attack models under such setup do not consider that the attacker can confront
resistance and thus can only take partial control of the agent , as
well as introducing perceivable ``abnormal'' behaviors that are easily
detectable. A provable defense against these adversarial policies is also
lacking. To resolve these issues, we introduce a more general attack
formulation that models to what extent the adversary is able to control the
agent to produce the adversarial policy. Based on such a generalized attack
framework, the attacker can also regulate the state distribution shift caused
by the attack through an attack budget, and thus produce stealthy adversarial
policies that can exploit the victim agent. Furthermore, we provide the first
provably robust defenses with convergence guarantee to the most robust victim
policy via adversarial training with timescale separation, in sharp contrast to
adversarial training in supervised learning which may only provide {\it
empirical} defenses
Robustness to Multi-Modal Environment Uncertainty in MARL using Curriculum Learning
Multi-agent reinforcement learning (MARL) plays a pivotal role in tackling
real-world challenges. However, the seamless transition of trained policies
from simulations to real-world requires it to be robust to various
environmental uncertainties. Existing works focus on finding Nash Equilibrium
or the optimal policy under uncertainty in one environment variable (i.e.
action, state or reward). This is because a multi-agent system itself is highly
complex and unstationary. However, in real-world situation uncertainty can
occur in multiple environment variables simultaneously. This work is the first
to formulate the generalised problem of robustness to multi-modal environment
uncertainty in MARL. To this end, we propose a general robust training approach
for multi-modal uncertainty based on curriculum learning techniques. We handle
two distinct environmental uncertainty simultaneously and present extensive
results across both cooperative and competitive MARL environments,
demonstrating that our approach achieves state-of-the-art levels of robustness
Game-Theoretic Robust Reinforcement Learning Handles Temporally-Coupled Perturbations
Robust reinforcement learning (RL) seeks to train policies that can perform
well under environment perturbations or adversarial attacks. Existing
approaches typically assume that the space of possible perturbations remains
the same across timesteps. However, in many settings, the space of possible
perturbations at a given timestep depends on past perturbations. We formally
introduce temporally-coupled perturbations, presenting a novel challenge for
existing robust RL methods. To tackle this challenge, we propose GRAD, a novel
game-theoretic approach that treats the temporally-coupled robust RL problem as
a partially-observable two-player zero-sum game. By finding an approximate
equilibrium in this game, GRAD ensures the agent's robustness against
temporally-coupled perturbations. Empirical experiments on a variety of
continuous control tasks demonstrate that our proposed approach exhibits
significant robustness advantages compared to baselines against both standard
and temporally-coupled attacks, in both state and action spaces
Adversarial Auto-Augment with Label Preservation: A Representation Learning Principle Guided Approach
Data augmentation is a critical contributing factor to the success of deep
learning but heavily relies on prior domain knowledge which is not always
available. Recent works on automatic data augmentation learn a policy to form a
sequence of augmentation operations, which are still pre-defined and restricted
to limited options. In this paper, we show that a prior-free autonomous data
augmentation's objective can be derived from a representation learning
principle that aims to preserve the minimum sufficient information of the
labels. Given an example, the objective aims at creating a distant "hard
positive example" as the augmentation, while still preserving the original
label. We then propose a practical surrogate to the objective that can be
optimized efficiently and integrated seamlessly into existing methods for a
broad class of machine learning tasks, e.g., supervised, semi-supervised, and
noisy-label learning. Unlike previous works, our method does not require
training an extra generative model but instead leverages the intermediate layer
representations of the end-task model for generating data augmentations. In
experiments, we show that our method consistently brings non-trivial
improvements to the three aforementioned learning tasks from both efficiency
and final performance, either or not combined with strong pre-defined
augmentations, e.g., on medical images when domain knowledge is unavailable and
the existing augmentation techniques perform poorly. Code is available at:
https://github.com/kai-wen-yang/LPA3}{https://github.com/kai-wen-yang/LPA3.Comment: 36th Conference on Neural Information Processing Systems (NeurIPS
2022
- …