4,996 research outputs found
Randomized Adversarial Imitation Learning for Autonomous Driving
With the evolution of various advanced driver assistance system (ADAS)
platforms, the design of autonomous driving system is becoming more complex and
safety-critical. The autonomous driving system simultaneously activates
multiple ADAS functions; and thus it is essential to coordinate various ADAS
functions. This paper proposes a randomized adversarial imitation learning
(RAIL) method that imitates the coordination of autonomous vehicle equipped
with advanced sensors. The RAIL policies are trained through derivative-free
optimization for the decision maker that coordinates the proper ADAS functions,
e.g., smart cruise control and lane keeping system. Especially, the proposed
method is also able to deal with the LIDAR data and makes decisions in complex
multi-lane highways and multi-agent environments
Imitation Attacks and Defenses for Black-box Machine Translation Systems
Adversaries may look to steal or attack black-box NLP systems, either for
financial gain or to exploit model errors. One setting of particular interest
is machine translation (MT), where models have high commercial value and errors
can be costly. We investigate possible exploits of black-box MT systems and
explore a preliminary defense against such threats. We first show that MT
systems can be stolen by querying them with monolingual sentences and training
models to imitate their outputs. Using simulated experiments, we demonstrate
that MT model stealing is possible even when imitation models have different
input data or architectures than their target models. Applying these ideas, we
train imitation models that reach within 0.6 BLEU of three production MT
systems on both high-resource and low-resource language pairs. We then leverage
the similarity of our imitation models to transfer adversarial examples to the
production systems. We use gradient-based attacks that expose inputs which lead
to semantically-incorrect translations, dropped content, and vulgar model
outputs. To mitigate these vulnerabilities, we propose a defense that modifies
translation outputs in order to misdirect the optimization of imitation models.
This defense degrades the adversary's BLEU score and attack success rate at
some cost in the defender's BLEU and inference speed.Comment: EMNLP 202
Multimodal Storytelling via Generative Adversarial Imitation Learning
Deriving event storylines is an effective summarization method to succinctly
organize extensive information, which can significantly alleviate the pain of
information overload. The critical challenge is the lack of widely recognized
definition of storyline metric. Prior studies have developed various approaches
based on different assumptions about users' interests. These works can extract
interesting patterns, but their assumptions do not guarantee that the derived
patterns will match users' preference. On the other hand, their exclusiveness
of single modality source misses cross-modality information. This paper
proposes a method, multimodal imitation learning via generative adversarial
networks(MIL-GAN), to directly model users' interests as reflected by various
data. In particular, the proposed model addresses the critical challenge by
imitating users' demonstrated storylines. Our proposed model is designed to
learn the reward patterns given user-provided storylines and then applies the
learned policy to unseen data. The proposed approach is demonstrated to be
capable of acquiring the user's implicit intent and outperforming competing
methods by a substantial margin with a user study.Comment: IJCAI 201
Random Expert Distillation: Imitation Learning via Expert Policy Support Estimation
We consider the problem of imitation learning from a finite set of expert
trajectories, without access to reinforcement signals. The classical approach
of extracting the expert's reward function via inverse reinforcement learning,
followed by reinforcement learning is indirect and may be computationally
expensive. Recent generative adversarial methods based on matching the policy
distribution between the expert and the agent could be unstable during
training. We propose a new framework for imitation learning by estimating the
support of the expert policy to compute a fixed reward function, which allows
us to re-frame imitation learning within the standard reinforcement learning
setting. We demonstrate the efficacy of our reward function on both discrete
and continuous domains, achieving comparable or better performance than the
state of the art under different reinforcement learning algorithms
Reinforcement and Imitation Learning for Diverse Visuomotor Skills
We propose a model-free deep reinforcement learning method that leverages a
small amount of demonstration data to assist a reinforcement learning agent. We
apply this approach to robotic manipulation tasks and train end-to-end
visuomotor policies that map directly from RGB camera inputs to joint
velocities. We demonstrate that our approach can solve a wide variety of
visuomotor tasks, for which engineering a scripted controller would be
laborious. In experiments, our reinforcement and imitation agent achieves
significantly better performances than agents trained with reinforcement
learning or imitation learning alone. We also illustrate that these policies,
trained with large visual and dynamics variations, can achieve preliminary
successes in zero-shot sim2real transfer. A brief visual description of this
work can be viewed in https://youtu.be/EDl8SQUNjj0Comment: 13 pages, 6 figures, Published in RSS 201
Generative Adversarial Imitation Learning
Consider learning a policy from example expert behavior, without interaction
with the expert or access to reinforcement signal. One approach is to recover
the expert's cost function with inverse reinforcement learning, then extract a
policy from that cost function with reinforcement learning. This approach is
indirect and can be slow. We propose a new general framework for directly
extracting a policy from data, as if it were obtained by reinforcement learning
following inverse reinforcement learning. We show that a certain instantiation
of our framework draws an analogy between imitation learning and generative
adversarial networks, from which we derive a model-free imitation learning
algorithm that obtains significant performance gains over existing model-free
methods in imitating complex behaviors in large, high-dimensional environments
Physical Adversarial Textures that Fool Visual Object Tracking
We present a system for generating inconspicuous-looking textures that, when
displayed in the physical world as digital or printed posters, cause visual
object tracking systems to become confused. For instance, as a target being
tracked by a robot's camera moves in front of such a poster, our generated
texture makes the tracker lock onto it and allows the target to evade. This
work aims to fool seldom-targeted regression tasks, and in particular compares
diverse optimization strategies: non-targeted, targeted, and a new family of
guided adversarial losses. While we use the Expectation Over Transformation
(EOT) algorithm to generate physical adversaries that fool tracking models when
imaged under diverse conditions, we compare the impacts of different
conditioning variables, including viewpoint, lighting, and appearances, to find
practical attack setups with high resulting adversarial strength and
convergence speed. We further showcase textures optimized solely using
simulated scenes can confuse real-world tracking systems.Comment: Accepted to the International Conference on Computer Vision (ICCV)
201
Variational Discriminator Bottleneck: Improving Imitation Learning, Inverse RL, and GANs by Constraining Information Flow
Adversarial learning methods have been proposed for a wide range of
applications, but the training of adversarial models can be notoriously
unstable. Effectively balancing the performance of the generator and
discriminator is critical, since a discriminator that achieves very high
accuracy will produce relatively uninformative gradients. In this work, we
propose a simple and general technique to constrain information flow in the
discriminator by means of an information bottleneck. By enforcing a constraint
on the mutual information between the observations and the discriminator's
internal representation, we can effectively modulate the discriminator's
accuracy and maintain useful and informative gradients. We demonstrate that our
proposed variational discriminator bottleneck (VDB) leads to significant
improvements across three distinct application areas for adversarial learning
algorithms. Our primary evaluation studies the applicability of the VDB to
imitation learning of dynamic continuous control skills, such as running. We
show that our method can learn such skills directly from \emph{raw} video
demonstrations, substantially outperforming prior adversarial imitation
learning methods. The VDB can also be combined with adversarial inverse
reinforcement learning to learn parsimonious reward functions that can be
transferred and re-optimized in new settings. Finally, we demonstrate that VDB
can train GANs more effectively for image generation, improving upon a number
of prior stabilization methods
Extrapolating Beyond Suboptimal Demonstrations via Inverse Reinforcement Learning from Observations
A critical flaw of existing inverse reinforcement learning (IRL) methods is
their inability to significantly outperform the demonstrator. This is because
IRL typically seeks a reward function that makes the demonstrator appear
near-optimal, rather than inferring the underlying intentions of the
demonstrator that may have been poorly executed in practice. In this paper, we
introduce a novel reward-learning-from-observation algorithm, Trajectory-ranked
Reward EXtrapolation (T-REX), that extrapolates beyond a set of (approximately)
ranked demonstrations in order to infer high-quality reward functions from a
set of potentially poor demonstrations. When combined with deep reinforcement
learning, T-REX outperforms state-of-the-art imitation learning and IRL methods
on multiple Atari and MuJoCo benchmark tasks and achieves performance that is
often more than twice the performance of the best demonstration. We also
demonstrate that T-REX is robust to ranking noise and can accurately
extrapolate intention by simply watching a learner noisily improve at a task
over time.Comment: In proceedings of Thirty-sixth International Conference on Machine
Learning (ICML 2019
Task-Oriented Hand Motion Retargeting for Dexterous Manipulation Imitation
Human hand actions are quite complex, especially when they involve object
manipulation, mainly due to the high dimensionality of the hand and the vast
action space that entails. Imitating those actions with dexterous hand models
involves different important and challenging steps: acquiring human hand
information, retargeting it to a hand model, and learning a policy from
acquired data. In this work, we capture the hand information by using a
state-of-the-art hand pose estimator. We tackle the retargeting problem from
the hand pose to a 29 DoF hand model by combining inverse kinematics and PSO
with a task objective optimisation. This objective encourages the virtual hand
to accomplish the manipulation task, relieving the effect of the estimator's
noise and the domain gap. Our approach leads to a better success rate in the
grasping task compared to our inverse kinematics baseline, allowing us to
record successful human demonstrations. Furthermore, we used these
demonstrations to learn a policy network using generative adversarial imitation
learning (GAIL) that is able to autonomously grasp an object in the virtual
space.Comment: ECCV 2018 workshop pape
- …