4,525 research outputs found
Multi-Agent Adversarial Inverse Reinforcement Learning
Reinforcement learning agents are prone to undesired behaviors due to reward
mis-specification. Finding a set of reward functions to properly guide agent
behaviors is particularly challenging in multi-agent scenarios. Inverse
reinforcement learning provides a framework to automatically acquire suitable
reward functions from expert demonstrations. Its extension to multi-agent
settings, however, is difficult due to the more complex notions of rational
behaviors. In this paper, we propose MA-AIRL, a new framework for multi-agent
inverse reinforcement learning, which is effective and scalable for Markov
games with high-dimensional state-action space and unknown dynamics. We derive
our algorithm based on a new solution concept and maximum pseudolikelihood
estimation within an adversarial reward learning framework. In the experiments,
we demonstrate that MA-AIRL can recover reward functions that are highly
correlated with ground truth ones, and significantly outperforms prior methods
in terms of policy imitation.Comment: ICML 201
Scalable Multi-Agent Inverse Reinforcement Learning via Actor-Attention-Critic
Multi-agent adversarial inverse reinforcement learning (MA-AIRL) is a recent
approach that applies single-agent AIRL to multi-agent problems where we seek
to recover both policies for our agents and reward functions that promote
expert-like behavior. While MA-AIRL has promising results on cooperative and
competitive tasks, it is sample-inefficient and has only been validated
empirically for small numbers of agents -- its ability to scale to many agents
remains an open question. We propose a multi-agent inverse RL algorithm that is
more sample-efficient and scalable than previous works. Specifically, we employ
multi-agent actor-attention-critic (MAAC) -- an off-policy multi-agent RL
(MARL) method -- for the RL inner loop of the inverse RL procedure. In doing
so, we are able to increase sample efficiency compared to state-of-the-art
baselines, across both small- and large-scale tasks. Moreover, the RL agents
trained on the rewards recovered by our method better match the experts than
those trained on the rewards derived from the baselines. Finally, our method
requires far fewer agent-environment interactions, particularly as the number
of agents increases
Discriminator-Actor-Critic: Addressing Sample Inefficiency and Reward Bias in Adversarial Imitation Learning
We identify two issues with the family of algorithms based on the Adversarial
Imitation Learning framework. The first problem is implicit bias present in the
reward functions used in these algorithms. While these biases might work well
for some environments, they can also lead to sub-optimal behavior in others.
Secondly, even though these algorithms can learn from few expert
demonstrations, they require a prohibitively large number of interactions with
the environment in order to imitate the expert for many real-world
applications. In order to address these issues, we propose a new algorithm
called Discriminator-Actor-Critic that uses off-policy Reinforcement Learning
to reduce policy-environment interaction sample complexity by an average factor
of 10. Furthermore, since our reward function is designed to be unbiased, we
can apply our algorithm to many problems without making any task-specific
adjustments
Competitive Multi-agent Inverse Reinforcement Learning with Sub-optimal Demonstrations
This paper considers the problem of inverse reinforcement learning in
zero-sum stochastic games when expert demonstrations are known to be not
optimal. Compared to previous works that decouple agents in the game by
assuming optimality in expert strategies, we introduce a new objective function
that directly pits experts against Nash Equilibrium strategies, and we design
an algorithm to solve for the reward function in the context of inverse
reinforcement learning with deep neural networks as model approximations. In
our setting the model and algorithm do not decouple by agent. In order to find
Nash Equilibrium in large-scale games, we also propose an adversarial training
algorithm for zero-sum stochastic games, and show the theoretical appeal of
non-existence of local optima in its objective function. In our numerical
experiments, we demonstrate that our Nash Equilibrium and inverse reinforcement
learning algorithms address games that are not amenable to previous approaches
using tabular representations. Moreover, with sub-optimal expert demonstrations
our algorithms recover both reward functions and strategies with good quality.Comment: 31 pages, to be presented at ICML 201
Active Perception in Adversarial Scenarios using Maximum Entropy Deep Reinforcement Learning
We pose an active perception problem where an autonomous agent actively
interacts with a second agent with potentially adversarial behaviors. Given the
uncertainty in the intent of the other agent, the objective is to collect
further evidence to help discriminate potential threats. The main technical
challenges are the partial observability of the agent intent, the adversary
modeling, and the corresponding uncertainty modeling. Note that an adversary
agent may act to mislead the autonomous agent by using a deceptive strategy
that is learned from past experiences. We propose an approach that combines
belief space planning, generative adversary modeling, and maximum entropy
reinforcement learning to obtain a stochastic belief space policy. By
accounting for various adversarial behaviors in the simulation framework and
minimizing the predictability of the autonomous agent's action, the resulting
policy is more robust to unmodeled adversarial strategies. This improved
robustness is empirically shown against an adversary that adapts to and
exploits the autonomous agent's policy when compared with a standard
Chance-Constraint Partially Observable Markov Decision Process robust approach
Randomized Adversarial Imitation Learning for Autonomous Driving
With the evolution of various advanced driver assistance system (ADAS)
platforms, the design of autonomous driving system is becoming more complex and
safety-critical. The autonomous driving system simultaneously activates
multiple ADAS functions; and thus it is essential to coordinate various ADAS
functions. This paper proposes a randomized adversarial imitation learning
(RAIL) method that imitates the coordination of autonomous vehicle equipped
with advanced sensors. The RAIL policies are trained through derivative-free
optimization for the decision maker that coordinates the proper ADAS functions,
e.g., smart cruise control and lane keeping system. Especially, the proposed
method is also able to deal with the LIDAR data and makes decisions in complex
multi-lane highways and multi-agent environments
Synthesizing Programs for Images using Reinforced Adversarial Learning
Advances in deep generative networks have led to impressive results in recent
years. Nevertheless, such models can often waste their capacity on the minutiae
of datasets, presumably due to weak inductive biases in their decoders. This is
where graphics engines may come in handy since they abstract away low-level
details and represent images as high-level programs. Current methods that
combine deep learning and renderers are limited by hand-crafted likelihood or
distance functions, a need for large amounts of supervision, or difficulties in
scaling their inference algorithms to richer datasets. To mitigate these
issues, we present SPIRAL, an adversarially trained agent that generates a
program which is executed by a graphics engine to interpret and sample images.
The goal of this agent is to fool a discriminator network that distinguishes
between real and rendered data, trained with a distributed reinforcement
learning setup without any supervision. A surprising finding is that using the
discriminator's output as a reward signal is the key to allow the agent to make
meaningful progress at matching the desired output rendering. To the best of
our knowledge, this is the first demonstration of an end-to-end, unsupervised
and adversarial inverse graphics agent on challenging real world (MNIST,
Omniglot, CelebA) and synthetic 3D datasets.Comment: 12 pages, 13 figure
Exploring applications of deep reinforcement learning for real-world autonomous driving systems
Deep Reinforcement Learning (DRL) has become increasingly powerful in recent
years, with notable achievements such as Deepmind's AlphaGo. It has been
successfully deployed in commercial vehicles like Mobileye's path planning
system. However, a vast majority of work on DRL is focused on toy examples in
controlled synthetic car simulator environments such as TORCS and CARLA. In
general, DRL is still at its infancy in terms of usability in real-world
applications. Our goal in this paper is to encourage real-world deployment of
DRL in various autonomous driving (AD) applications. We first provide an
overview of the tasks in autonomous driving systems, reinforcement learning
algorithms and applications of DRL to AD systems. We then discuss the
challenges which must be addressed to enable further progress towards
real-world deployment.Comment: Accepted for Oral Presentation at VISAPP 201
Third-Person Imitation Learning
Reinforcement learning (RL) makes it possible to train agents capable of
achieving sophisticated goals in complex and uncertain environments. A key
difficulty in reinforcement learning is specifying a reward function for the
agent to optimize. Traditionally, imitation learning in RL has been used to
overcome this problem. Unfortunately, hitherto imitation learning methods tend
to require that demonstrations are supplied in the first-person: the agent is
provided with a sequence of states and a specification of the actions that it
should have taken. While powerful, this kind of imitation learning is limited
by the relatively hard problem of collecting first-person demonstrations.
Humans address this problem by learning from third-person demonstrations: they
observe other humans perform tasks, infer the task, and accomplish the same
task themselves.
In this paper, we present a method for unsupervised third-person imitation
learning. Here third-person refers to training an agent to correctly achieve a
simple goal in a simple environment when it is provided a demonstration of a
teacher achieving the same goal but from a different viewpoint; and
unsupervised refers to the fact that the agent receives only these third-person
demonstrations, and is not provided a correspondence between teacher states and
student states. Our methods primary insight is that recent advances from domain
confusion can be utilized to yield domain agnostic features which are crucial
during the training process. To validate our approach, we report successful
experiments on learning from third-person demonstrations in a pointmass domain,
a reacher domain, and inverted pendulum.Comment: Only changed the abstract to remove unneeded hyphen
RAIL: Risk-Averse Imitation Learning
Imitation learning algorithms learn viable policies by imitating an expert's
behavior when reward signals are not available. Generative Adversarial
Imitation Learning (GAIL) is a state-of-the-art algorithm for learning policies
when the expert's behavior is available as a fixed set of trajectories. We
evaluate in terms of the expert's cost function and observe that the
distribution of trajectory-costs is often more heavy-tailed for GAIL-agents
than the expert at a number of benchmark continuous-control tasks. Thus,
high-cost trajectories, corresponding to tail-end events of catastrophic
failure, are more likely to be encountered by the GAIL-agents than the expert.
This makes the reliability of GAIL-agents questionable when it comes to
deployment in risk-sensitive applications like robotic surgery and autonomous
driving. In this work, we aim to minimize the occurrence of tail-end events by
minimizing tail risk within the GAIL framework. We quantify tail risk by the
Conditional-Value-at-Risk (CVaR) of trajectories and develop the Risk-Averse
Imitation Learning (RAIL) algorithm. We observe that the policies learned with
RAIL show lower tail-end risk than those of vanilla GAIL. Thus the proposed
RAIL algorithm appears as a potent alternative to GAIL for improved reliability
in risk-sensitive applications.Comment: Accepted for presentation in Deep Reinforcement Learning Symposium at
NIPS 201
- …