Search CORE

2,018 research outputs found

Polyphonic Music Generation with Sequence Generative Adversarial Networks

Author: Hwang Uiwon
Lee Sang-gil
Min Seonwoo
Yoon Sungroh
Publication venue
Publication date: 02/07/2018
Field of study

We propose an application of sequence generative adversarial networks (SeqGAN), which are generative adversarial networks for discrete sequence generation, for creating polyphonic musical sequences. Instead of a monophonic melody generation suggested in the original work, we present an efficient representation of a polyphony MIDI file that simultaneously captures chords and melodies with dynamic timings. The proposed method condenses duration, octaves, and keys of both melodies and chords into a single word vector representation, and recurrent neural networks learn to predict distributions of sequences from the embedded musical word space. We experiment with the original method and the least squares method to the discriminator, which is known to stabilize the training of GANs. The network can create sequences that are musically coherent and shows an improved quantitative and qualitative measures. We also report that careful optimization of reinforcement learning signals of the model is crucial for general application of the model.Comment: 8 pages, 3 figures, 3 table

arXiv.org e-Print Archive

Query-Efficient Black-Box Attack Against Sequence-Based Malware Classifiers

Author: Elovici Yuval
Rokach Lior
Rosenberg Ishai
Shabtai Asaf
Publication venue
Publication date: 03/10/2020
Field of study

In this paper, we present a generic, query-efficient black-box attack against API call-based machine learning malware classifiers. We generate adversarial examples by modifying the malware's API call sequences and non-sequential features (printable strings), and these adversarial examples will be misclassified by the target malware classifier without affecting the malware's functionality. In contrast to previous studies, our attack minimizes the number of malware classifier queries required. In addition, in our attack, the attacker must only know the class predicted by the malware classifier; attacker knowledge of the malware classifier's confidence score is optional. We evaluate the attack effectiveness when attacks are performed against a variety of malware classifier architectures, including recurrent neural network (RNN) variants, deep neural networks, support vector machines, and gradient boosted decision trees. Our attack success rate is around 98% when the classifier's confidence score is known and 64% when just the classifier's predicted class is known. We implement four state-of-the-art query-efficient attacks and show that our attack requires fewer queries and less knowledge about the attacked model's architecture than other existing query-efficient attacks, making it practical for attacking cloud-based malware classifiers at a minimal cost.Comment: Accepted as a conference paper at ACSAC 202

arXiv.org e-Print Archive

Automatic Goal Generation for Reinforcement Learning Agents

Author: Abbeel Pieter
Florensa Carlos
Geng Xinyang
Held David
Publication venue
Publication date: 23/07/2018
Field of study

Reinforcement learning is a powerful technique to train an agent to perform a task. However, an agent that is trained using reinforcement learning is only capable of achieving the single task that is specified via its reward function. Such an approach does not scale well to settings in which an agent needs to perform a diverse set of tasks, such as navigating to varying positions in a room or moving objects to varying locations. Instead, we propose a method that allows an agent to automatically discover the range of tasks that it is capable of performing. We use a generator network to propose tasks for the agent to try to achieve, specified as goal states. The generator network is optimized using adversarial training to produce tasks that are always at the appropriate level of difficulty for the agent. Our method thus automatically produces a curriculum of tasks for the agent to learn. We show that, by using this framework, an agent can efficiently and automatically learn to perform a wide set of tasks without requiring any prior knowledge of its environment. Our method can also learn to achieve tasks with sparse rewards, which traditionally pose significant challenges.Comment: Accepted at ICML 2018, Proceedings of the 35th International Conference on Machine Learning, Stockholm, Sweden, PMLR 80, 201

arXiv.org e-Print Archive

Deep learning for molecular design - a review of the state of the art

Author: Boukouvalas Zois
Chung Peter W.
Elton Daniel C.
Fuge Mark D.
Publication venue: 'Royal Society of Chemistry (RSC)'
Publication date: 22/05/2019
Field of study

In the space of only a few years, deep generative modeling has revolutionized how we think of artificial creativity, yielding autonomous systems which produce original images, music, and text. Inspired by these successes, researchers are now applying deep generative modeling techniques to the generation and optimization of molecules - in our review we found 45 papers on the subject published in the past two years. These works point to a future where such systems will be used to generate lead molecules, greatly reducing resources spent downstream synthesizing and characterizing bad leads in the lab. In this review we survey the increasingly complex landscape of models and representation schemes that have been proposed. The four classes of techniques we describe are recursive neural networks, autoencoders, generative adversarial networks, and reinforcement learning. After first discussing some of the mathematical fundamentals of each technique, we draw high level connections and comparisons with other techniques and expose the pros and cons of each. Several important high level themes emerge as a result of this work, including the shift away from the SMILES string representation of molecules towards more sophisticated representations such as graph grammars and 3D representations, the importance of reward function design, the need for better standards for benchmarking and testing, and the benefits of adversarial training and reinforcement learning over maximum likelihood based training.Comment: 24 pages, new title, published in RSC MSD

arXiv.org e-Print Archive

Randomized Adversarial Imitation Learning for Autonomous Driving

Author: Kim Joongheon
Shin MyungJae
Publication venue
Publication date: 13/05/2019
Field of study

With the evolution of various advanced driver assistance system (ADAS) platforms, the design of autonomous driving system is becoming more complex and safety-critical. The autonomous driving system simultaneously activates multiple ADAS functions; and thus it is essential to coordinate various ADAS functions. This paper proposes a randomized adversarial imitation learning (RAIL) method that imitates the coordination of autonomous vehicle equipped with advanced sensors. The RAIL policies are trained through derivative-free optimization for the decision maker that coordinates the proper ADAS functions, e.g., smart cruise control and lane keeping system. Especially, the proposed method is also able to deal with the LIDAR data and makes decisions in complex multi-lane highways and multi-agent environments

arXiv.org e-Print Archive

Detecting Deceptive Reviews using Generative Adversarial Networks

Author: Aghakhani Hojjat
Kruegel Christopher
Machiry Aravind
Nilizadeh Shirin
Vigna Giovanni
Publication venue
Publication date: 25/05/2018
Field of study

In the past few years, consumer review sites have become the main target of deceptive opinion spam, where fictitious opinions or reviews are deliberately written to sound authentic. Most of the existing work to detect the deceptive reviews focus on building supervised classifiers based on syntactic and lexical patterns of an opinion. With the successful use of Neural Networks on various classification applications, in this paper, we propose FakeGAN a system that for the first time augments and adopts Generative Adversarial Networks (GANs) for a text classification task, in particular, detecting deceptive reviews. Unlike standard GAN models which have a single Generator and Discriminator model, FakeGAN uses two discriminator models and one generative model. The generator is modeled as a stochastic policy agent in reinforcement learning (RL), and the discriminators use Monte Carlo search algorithm to estimate and pass the intermediate action-value as the RL reward to the generator. Providing the generator model with two discriminator models avoids the mod collapse issue by learning from both distributions of truthful and deceptive reviews. Indeed, our experiments show that using two discriminators provides FakeGAN high stability, which is a known issue for GAN architectures. While FakeGAN is built upon a semi-supervised classifier, known for less accuracy, our evaluation results on a dataset of TripAdvisor hotel reviews show the same performance in terms of accuracy as of the state-of-the-art approaches that apply supervised machine learning. These results indicate that GANs can be effective for text classification tasks. Specifically, FakeGAN is effective at detecting deceptive reviews.Comment: accepted at 1st Deep Learning and Security Workshop co-located with the 39th IEEE Symposium on Security and Privac

arXiv.org e-Print Archive

RAIL: Risk-Averse Imitation Learning

Author: Avancha Sasikanth
Das Dipankar
Kaul Bharat
Mudigere Dheevatsa
Naik Abhishek
Ravindran Balaraman
Santara Anirban
Publication venue
Publication date: 29/11/2017
Field of study

Imitation learning algorithms learn viable policies by imitating an expert's behavior when reward signals are not available. Generative Adversarial Imitation Learning (GAIL) is a state-of-the-art algorithm for learning policies when the expert's behavior is available as a fixed set of trajectories. We evaluate in terms of the expert's cost function and observe that the distribution of trajectory-costs is often more heavy-tailed for GAIL-agents than the expert at a number of benchmark continuous-control tasks. Thus, high-cost trajectories, corresponding to tail-end events of catastrophic failure, are more likely to be encountered by the GAIL-agents than the expert. This makes the reliability of GAIL-agents questionable when it comes to deployment in risk-sensitive applications like robotic surgery and autonomous driving. In this work, we aim to minimize the occurrence of tail-end events by minimizing tail risk within the GAIL framework. We quantify tail risk by the Conditional-Value-at-Risk (CVaR) of trajectories and develop the Risk-Averse Imitation Learning (RAIL) algorithm. We observe that the policies learned with RAIL show lower tail-end risk than those of vanilla GAIL. Thus the proposed RAIL algorithm appears as a potent alternative to GAIL for improved reliability in risk-sensitive applications.Comment: Accepted for presentation in Deep Reinforcement Learning Symposium at NIPS 201

arXiv.org e-Print Archive

Surprising Negative Results for Generative Adversarial Tree Search

Author: Anandkumar Animashree
Azizzadenesheli Kamyar
Lipton Zachary C
Liu Weitang
Yang Brandon
Publication venue
Publication date: 04/09/2019
Field of study

While many recent advances in deep reinforcement learning (RL) rely on model-free methods, model-based approaches remain an alluring prospect for their potential to exploit unsupervised data to learn environment model. In this work, we provide an extensive study on the design of deep generative models for RL environments and propose a sample efficient and robust method to learn the model of Atari environments. We deploy this model and propose generative adversarial tree search (GATS) a deep RL algorithm that learns the environment model and implements Monte Carlo tree search (MCTS) on the learned model for planning. While MCTS on the learned model is computationally expensive, similar to AlphaGo, GATS follows depth limited MCTS. GATS employs deep Q network (DQN) and learns a Q-function to assign values to the leaves of the tree in MCTS. We theoretical analyze GATS vis-a-vis the bias-variance trade-off and show GATS is able to mitigate the worst-case error in the Q-estimate. While we were expecting GATS to enjoy a better sample complexity and faster converges to better policies, surprisingly, GATS fails to outperform DQN. We provide a study on which we show why depth limited MCTS fails to perform desirably

arXiv.org e-Print Archive

Meta-Learning surrogate models for sequential decision making

Author: Eslami S. M. Ali
Galashov Alexandre
Garnelo Marta
Kim Hyunjik
Kohli Pushmeet
Saxton David
Schwarz Jonathan
Teh Yee Whye
Publication venue
Publication date: 12/06/2019
Field of study

We introduce a unified probabilistic framework for solving sequential decision making problems ranging from Bayesian optimisation to contextual bandits and reinforcement learning. This is accomplished by a probabilistic model-based approach that explains observed data while capturing predictive uncertainty during the decision making process. Crucially, this probabilistic model is chosen to be a Meta-Learning system that allows learning from a distribution of related problems, allowing data efficient adaptation to a target task. As a suitable instantiation of this framework, we explore the use of Neural processes due to statistical and computational desiderata. We apply our framework to a broad range of problem domains, such as control problems, recommender systems and adversarial attacks on RL agents, demonstrating an efficient and general black-box learning approach

arXiv.org e-Print Archive

Backplay: "Man muss immer umkehren"

Author: Bruna Joan
Cho Kyunghyun
Kapoor Sanyam
Peysakhovich Alexander
Raileanu Roberta
Resnick Cinjon
Publication venue
Publication date: 31/12/2018
Field of study

Model-free reinforcement learning (RL) requires a large number of trials to learn a good policy, especially in environments with sparse rewards. We explore a method to improve the sample efficiency when we have access to demonstrations. Our approach, Backplay, uses a single demonstration to construct a curriculum for a given task. Rather than starting each training episode in the environment's fixed initial state, we start the agent near the end of the demonstration and move the starting point backwards during the course of training until we reach the initial state. Our contributions are that we analytically characterize the types of environments where Backplay can improve training speed, demonstrate the effectiveness of Backplay both in large grid worlds and a complex four player zero-sum game (Pommerman), and show that Backplay compares favorably to other competitive methods known to improve sample efficiency. This includes reward shaping, behavioral cloning, and reverse curriculum generation.Comment: AAAI-19 Workshop on Reinforcement Learning in Game

arXiv.org e-Print Archive