2,018 research outputs found
Polyphonic Music Generation with Sequence Generative Adversarial Networks
We propose an application of sequence generative adversarial networks
(SeqGAN), which are generative adversarial networks for discrete sequence
generation, for creating polyphonic musical sequences. Instead of a monophonic
melody generation suggested in the original work, we present an efficient
representation of a polyphony MIDI file that simultaneously captures chords and
melodies with dynamic timings. The proposed method condenses duration, octaves,
and keys of both melodies and chords into a single word vector representation,
and recurrent neural networks learn to predict distributions of sequences from
the embedded musical word space. We experiment with the original method and the
least squares method to the discriminator, which is known to stabilize the
training of GANs. The network can create sequences that are musically coherent
and shows an improved quantitative and qualitative measures. We also report
that careful optimization of reinforcement learning signals of the model is
crucial for general application of the model.Comment: 8 pages, 3 figures, 3 table
Query-Efficient Black-Box Attack Against Sequence-Based Malware Classifiers
In this paper, we present a generic, query-efficient black-box attack against
API call-based machine learning malware classifiers. We generate adversarial
examples by modifying the malware's API call sequences and non-sequential
features (printable strings), and these adversarial examples will be
misclassified by the target malware classifier without affecting the malware's
functionality. In contrast to previous studies, our attack minimizes the number
of malware classifier queries required. In addition, in our attack, the
attacker must only know the class predicted by the malware classifier; attacker
knowledge of the malware classifier's confidence score is optional. We evaluate
the attack effectiveness when attacks are performed against a variety of
malware classifier architectures, including recurrent neural network (RNN)
variants, deep neural networks, support vector machines, and gradient boosted
decision trees. Our attack success rate is around 98% when the classifier's
confidence score is known and 64% when just the classifier's predicted class is
known. We implement four state-of-the-art query-efficient attacks and show that
our attack requires fewer queries and less knowledge about the attacked model's
architecture than other existing query-efficient attacks, making it practical
for attacking cloud-based malware classifiers at a minimal cost.Comment: Accepted as a conference paper at ACSAC 202
Automatic Goal Generation for Reinforcement Learning Agents
Reinforcement learning is a powerful technique to train an agent to perform a
task. However, an agent that is trained using reinforcement learning is only
capable of achieving the single task that is specified via its reward function.
Such an approach does not scale well to settings in which an agent needs to
perform a diverse set of tasks, such as navigating to varying positions in a
room or moving objects to varying locations. Instead, we propose a method that
allows an agent to automatically discover the range of tasks that it is capable
of performing. We use a generator network to propose tasks for the agent to try
to achieve, specified as goal states. The generator network is optimized using
adversarial training to produce tasks that are always at the appropriate level
of difficulty for the agent. Our method thus automatically produces a
curriculum of tasks for the agent to learn. We show that, by using this
framework, an agent can efficiently and automatically learn to perform a wide
set of tasks without requiring any prior knowledge of its environment. Our
method can also learn to achieve tasks with sparse rewards, which traditionally
pose significant challenges.Comment: Accepted at ICML 2018, Proceedings of the 35th International
Conference on Machine Learning, Stockholm, Sweden, PMLR 80, 201
Deep learning for molecular design - a review of the state of the art
In the space of only a few years, deep generative modeling has revolutionized
how we think of artificial creativity, yielding autonomous systems which
produce original images, music, and text. Inspired by these successes,
researchers are now applying deep generative modeling techniques to the
generation and optimization of molecules - in our review we found 45 papers on
the subject published in the past two years. These works point to a future
where such systems will be used to generate lead molecules, greatly reducing
resources spent downstream synthesizing and characterizing bad leads in the
lab. In this review we survey the increasingly complex landscape of models and
representation schemes that have been proposed. The four classes of techniques
we describe are recursive neural networks, autoencoders, generative adversarial
networks, and reinforcement learning. After first discussing some of the
mathematical fundamentals of each technique, we draw high level connections and
comparisons with other techniques and expose the pros and cons of each. Several
important high level themes emerge as a result of this work, including the
shift away from the SMILES string representation of molecules towards more
sophisticated representations such as graph grammars and 3D representations,
the importance of reward function design, the need for better standards for
benchmarking and testing, and the benefits of adversarial training and
reinforcement learning over maximum likelihood based training.Comment: 24 pages, new title, published in RSC MSD
Randomized Adversarial Imitation Learning for Autonomous Driving
With the evolution of various advanced driver assistance system (ADAS)
platforms, the design of autonomous driving system is becoming more complex and
safety-critical. The autonomous driving system simultaneously activates
multiple ADAS functions; and thus it is essential to coordinate various ADAS
functions. This paper proposes a randomized adversarial imitation learning
(RAIL) method that imitates the coordination of autonomous vehicle equipped
with advanced sensors. The RAIL policies are trained through derivative-free
optimization for the decision maker that coordinates the proper ADAS functions,
e.g., smart cruise control and lane keeping system. Especially, the proposed
method is also able to deal with the LIDAR data and makes decisions in complex
multi-lane highways and multi-agent environments
Detecting Deceptive Reviews using Generative Adversarial Networks
In the past few years, consumer review sites have become the main target of
deceptive opinion spam, where fictitious opinions or reviews are deliberately
written to sound authentic. Most of the existing work to detect the deceptive
reviews focus on building supervised classifiers based on syntactic and lexical
patterns of an opinion. With the successful use of Neural Networks on various
classification applications, in this paper, we propose FakeGAN a system that
for the first time augments and adopts Generative Adversarial Networks (GANs)
for a text classification task, in particular, detecting deceptive reviews.
Unlike standard GAN models which have a single Generator and Discriminator
model, FakeGAN uses two discriminator models and one generative model. The
generator is modeled as a stochastic policy agent in reinforcement learning
(RL), and the discriminators use Monte Carlo search algorithm to estimate and
pass the intermediate action-value as the RL reward to the generator. Providing
the generator model with two discriminator models avoids the mod collapse issue
by learning from both distributions of truthful and deceptive reviews. Indeed,
our experiments show that using two discriminators provides FakeGAN high
stability, which is a known issue for GAN architectures. While FakeGAN is built
upon a semi-supervised classifier, known for less accuracy, our evaluation
results on a dataset of TripAdvisor hotel reviews show the same performance in
terms of accuracy as of the state-of-the-art approaches that apply supervised
machine learning. These results indicate that GANs can be effective for text
classification tasks. Specifically, FakeGAN is effective at detecting deceptive
reviews.Comment: accepted at 1st Deep Learning and Security Workshop co-located with
the 39th IEEE Symposium on Security and Privac
RAIL: Risk-Averse Imitation Learning
Imitation learning algorithms learn viable policies by imitating an expert's
behavior when reward signals are not available. Generative Adversarial
Imitation Learning (GAIL) is a state-of-the-art algorithm for learning policies
when the expert's behavior is available as a fixed set of trajectories. We
evaluate in terms of the expert's cost function and observe that the
distribution of trajectory-costs is often more heavy-tailed for GAIL-agents
than the expert at a number of benchmark continuous-control tasks. Thus,
high-cost trajectories, corresponding to tail-end events of catastrophic
failure, are more likely to be encountered by the GAIL-agents than the expert.
This makes the reliability of GAIL-agents questionable when it comes to
deployment in risk-sensitive applications like robotic surgery and autonomous
driving. In this work, we aim to minimize the occurrence of tail-end events by
minimizing tail risk within the GAIL framework. We quantify tail risk by the
Conditional-Value-at-Risk (CVaR) of trajectories and develop the Risk-Averse
Imitation Learning (RAIL) algorithm. We observe that the policies learned with
RAIL show lower tail-end risk than those of vanilla GAIL. Thus the proposed
RAIL algorithm appears as a potent alternative to GAIL for improved reliability
in risk-sensitive applications.Comment: Accepted for presentation in Deep Reinforcement Learning Symposium at
NIPS 201
Surprising Negative Results for Generative Adversarial Tree Search
While many recent advances in deep reinforcement learning (RL) rely on
model-free methods, model-based approaches remain an alluring prospect for
their potential to exploit unsupervised data to learn environment model. In
this work, we provide an extensive study on the design of deep generative
models for RL environments and propose a sample efficient and robust method to
learn the model of Atari environments. We deploy this model and propose
generative adversarial tree search (GATS) a deep RL algorithm that learns the
environment model and implements Monte Carlo tree search (MCTS) on the learned
model for planning. While MCTS on the learned model is computationally
expensive, similar to AlphaGo, GATS follows depth limited MCTS. GATS employs
deep Q network (DQN) and learns a Q-function to assign values to the leaves of
the tree in MCTS. We theoretical analyze GATS vis-a-vis the bias-variance
trade-off and show GATS is able to mitigate the worst-case error in the
Q-estimate. While we were expecting GATS to enjoy a better sample complexity
and faster converges to better policies, surprisingly, GATS fails to outperform
DQN. We provide a study on which we show why depth limited MCTS fails to
perform desirably
Meta-Learning surrogate models for sequential decision making
We introduce a unified probabilistic framework for solving sequential
decision making problems ranging from Bayesian optimisation to contextual
bandits and reinforcement learning. This is accomplished by a probabilistic
model-based approach that explains observed data while capturing predictive
uncertainty during the decision making process. Crucially, this probabilistic
model is chosen to be a Meta-Learning system that allows learning from a
distribution of related problems, allowing data efficient adaptation to a
target task. As a suitable instantiation of this framework, we explore the use
of Neural processes due to statistical and computational desiderata. We apply
our framework to a broad range of problem domains, such as control problems,
recommender systems and adversarial attacks on RL agents, demonstrating an
efficient and general black-box learning approach
Backplay: "Man muss immer umkehren"
Model-free reinforcement learning (RL) requires a large number of trials to
learn a good policy, especially in environments with sparse rewards. We explore
a method to improve the sample efficiency when we have access to
demonstrations. Our approach, Backplay, uses a single demonstration to
construct a curriculum for a given task. Rather than starting each training
episode in the environment's fixed initial state, we start the agent near the
end of the demonstration and move the starting point backwards during the
course of training until we reach the initial state. Our contributions are that
we analytically characterize the types of environments where Backplay can
improve training speed, demonstrate the effectiveness of Backplay both in large
grid worlds and a complex four player zero-sum game (Pommerman), and show that
Backplay compares favorably to other competitive methods known to improve
sample efficiency. This includes reward shaping, behavioral cloning, and
reverse curriculum generation.Comment: AAAI-19 Workshop on Reinforcement Learning in Game
- …