5,707 research outputs found
Adversarial Active Exploration for Inverse Dynamics Model Learning
We present an adversarial active exploration for inverse dynamics model
learning, a simple yet effective learning scheme that incentivizes exploration
in an environment without any human intervention. Our framework consists of a
deep reinforcement learning (DRL) agent and an inverse dynamics model
contesting with each other. The former collects training samples for the
latter, with an objective to maximize the error of the latter. The latter is
trained with samples collected by the former, and generates rewards for the
former when it fails to predict the actual action taken by the former. In such
a competitive setting, the DRL agent learns to generate samples that the
inverse dynamics model fails to predict correctly, while the inverse dynamics
model learns to adapt to the challenging samples. We further propose a reward
structure that ensures the DRL agent to collect only moderately hard samples
but not overly hard ones that prevent the inverse model from predicting
effectively. We evaluate the effectiveness of our method on several robotic arm
and hand manipulation tasks against multiple baseline models. Experimental
results show that our method is comparable to those directly trained with
expert demonstrations, and superior to the other baselines even without any
human priors.Comment: Published as a conference paper at CoRL 201
Inverse reinforcement learning for video games
Deep reinforcement learning achieves superhuman performance in a range of
video game environments, but requires that a designer manually specify a reward
function. It is often easier to provide demonstrations of a target behavior
than to design a reward function describing that behavior. Inverse
reinforcement learning (IRL) algorithms can infer a reward from demonstrations
in low-dimensional continuous control environments, but there has been little
work on applying IRL to high-dimensional video games. In our CNN-AIRL baseline,
we modify the state-of-the-art adversarial IRL (AIRL) algorithm to use CNNs for
the generator and discriminator. To stabilize training, we normalize the reward
and increase the size of the discriminator training dataset. We additionally
learn a low-dimensional state representation using a novel autoencoder
architecture tuned for video game environments. This embedding is used as input
to the reward network, improving the sample efficiency of expert
demonstrations. Our method achieves high-level performance on the simple
Catcher video game, substantially outperforming the CNN-AIRL baseline. We also
score points on the Enduro Atari racing game, but do not match expert
performance, highlighting the need for further work.Comment: 10 pages, 4 figures. Submitted to NIPS Deep RL Worksho
Active Image Synthesis for Efficient Labeling
The great success achieved by deep neural networks attracts increasing
attention from the manufacturing and healthcare communities. However, the
limited availability of data and high costs of data collection are the major
challenges for the applications in those fields. We propose in this work AISEL,
an active image synthesis method for efficient labeling to improve the
performance of the small-data learning tasks. Specifically, a complementary
AISEL dataset is generated, with labels actively acquired via a physics-based
method to incorporate underlining physical knowledge at hand. An important
component of our AISEL method is the bidirectional generative invertible
network (GIN), which can extract interpretable features from the training
images and generate physically meaningful virtual images. Our AISEL method then
efficiently samples virtual images not only further exploits the uncertain
regions, but also explores the entire image space. We then discuss the
interpretability of GIN both theoretically and experimentally, demonstrating
clear visual improvements over the benchmarks. Finally, we demonstrate the
effectiveness of our AISEL framework on aortic stenosis application, in which
our method lower the labeling cost by while achieving a
improvement in prediction accuracy
Exploring applications of deep reinforcement learning for real-world autonomous driving systems
Deep Reinforcement Learning (DRL) has become increasingly powerful in recent
years, with notable achievements such as Deepmind's AlphaGo. It has been
successfully deployed in commercial vehicles like Mobileye's path planning
system. However, a vast majority of work on DRL is focused on toy examples in
controlled synthetic car simulator environments such as TORCS and CARLA. In
general, DRL is still at its infancy in terms of usability in real-world
applications. Our goal in this paper is to encourage real-world deployment of
DRL in various autonomous driving (AD) applications. We first provide an
overview of the tasks in autonomous driving systems, reinforcement learning
algorithms and applications of DRL to AD systems. We then discuss the
challenges which must be addressed to enable further progress towards
real-world deployment.Comment: Accepted for Oral Presentation at VISAPP 201
Active Perception in Adversarial Scenarios using Maximum Entropy Deep Reinforcement Learning
We pose an active perception problem where an autonomous agent actively
interacts with a second agent with potentially adversarial behaviors. Given the
uncertainty in the intent of the other agent, the objective is to collect
further evidence to help discriminate potential threats. The main technical
challenges are the partial observability of the agent intent, the adversary
modeling, and the corresponding uncertainty modeling. Note that an adversary
agent may act to mislead the autonomous agent by using a deceptive strategy
that is learned from past experiences. We propose an approach that combines
belief space planning, generative adversary modeling, and maximum entropy
reinforcement learning to obtain a stochastic belief space policy. By
accounting for various adversarial behaviors in the simulation framework and
minimizing the predictability of the autonomous agent's action, the resulting
policy is more robust to unmodeled adversarial strategies. This improved
robustness is empirically shown against an adversary that adapts to and
exploits the autonomous agent's policy when compared with a standard
Chance-Constraint Partially Observable Markov Decision Process robust approach
Security Theater: On the Vulnerability of Classifiers to Exploratory Attacks
The increasing scale and sophistication of cyberattacks has led to the
adoption of machine learning based classification techniques, at the core of
cybersecurity systems. These techniques promise scale and accuracy, which
traditional rule or signature based methods cannot. However, classifiers
operating in adversarial domains are vulnerable to evasion attacks by an
adversary, who is capable of learning the behavior of the system by employing
intelligently crafted probes. Classification accuracy in such domains provides
a false sense of security, as detection can easily be evaded by carefully
perturbing the input samples. In this paper, a generic data driven framework is
presented, to analyze the vulnerability of classification systems to black box
probing based attacks. The framework uses an exploration exploitation based
strategy, to understand an adversary's point of view of the attack defense
cycle. The adversary assumes a black box model of the defender's classifier and
can launch indiscriminate attacks on it, without information of the defender's
model type, training data or the domain of application. Experimental evaluation
on 10 real world datasets demonstrates that even models having high perceived
accuracy (>90%), by a defender, can be effectively circumvented with a high
evasion rate (>95%, on average). The detailed attack algorithms, adversarial
model and empirical evaluation, serve.Comment: Pacific-Asia Workshop on Intelligence and Security Informatics.
Springer, Cham, 201
Diversity-Driven Selection of Exploration Strategies in Multi-Armed Bandits
We consider a scenario where an agent has multiple available strategies to
explore an unknown environment. For each new interaction with the environment,
the agent must select which exploration strategy to use. We provide a new
strategy-agnostic method that treat the situation as a Multi-Armed Bandits
problem where the reward signal is the diversity of effects that each strategy
produces. We test the method empirically on a simulated planar robotic arm, and
establish that the method is both able discriminate between strategies of
dissimilar quality, even when the differences are tenuous, and that the
resulting performance is competitive with the best fixed mixture of strategies
Mid-Level Visual Representations Improve Generalization and Sample Efficiency for Learning Visuomotor Policies
How much does having visual priors about the world (e.g. the fact that the
world is 3D) assist in learning to perform downstream motor tasks (e.g.
delivering a package)? We study this question by integrating a generic
perceptual skill set (e.g. a distance estimator, an edge detector, etc.) within
a reinforcement learning framework--see Figure 1. This skill set (hereafter
mid-level perception) provides the policy with a more processed state of the
world compared to raw images.
We find that using a mid-level perception confers significant advantages over
training end-to-end from scratch (i.e. not leveraging priors) in
navigation-oriented tasks. Agents are able to generalize to situations where
the from-scratch approach fails and training becomes significantly more sample
efficient. However, we show that realizing these gains requires careful
selection of the mid-level perceptual skills. Therefore, we refine our findings
into an efficient max-coverage feature set that can be adopted in lieu of raw
images. We perform our study in completely separate buildings for training and
testing and compare against visually blind baseline policies and
state-of-the-art feature learning methods.Comment: See project website, demos, and code at http://perceptual.acto
COBRA: Data-Efficient Model-Based RL through Unsupervised Object Discovery and Curiosity-Driven Exploration
Data efficiency and robustness to task-irrelevant perturbations are
long-standing challenges for deep reinforcement learning algorithms. Here we
introduce a modular approach to addressing these challenges in a continuous
control environment, without using hand-crafted or supervised information. Our
Curious Object-Based seaRch Agent (COBRA) uses task-free intrinsically
motivated exploration and unsupervised learning to build object-based models of
its environment and action space. Subsequently, it can learn a variety of tasks
through model-based search in very few steps and excel on structured hold-out
tests of policy robustness
Verification for Machine Learning, Autonomy, and Neural Networks Survey
This survey presents an overview of verification techniques for autonomous
systems, with a focus on safety-critical autonomous cyber-physical systems
(CPS) and subcomponents thereof. Autonomy in CPS is enabling by recent advances
in artificial intelligence (AI) and machine learning (ML) through approaches
such as deep neural networks (DNNs), embedded in so-called learning enabled
components (LECs) that accomplish tasks from classification to control.
Recently, the formal methods and formal verification community has developed
methods to characterize behaviors in these LECs with eventual goals of formally
verifying specifications for LECs, and this article presents a survey of many
of these recent approaches
- …