89 research outputs found

    Recurrent policy gradients

    Get PDF
    Reinforcement learning for partially observable Markov decision problems (POMDPs) is a challenge as it requires policies with an internal state. Traditional approaches suffer significantly from this shortcoming and usually make strong assumptions on the problem domain such as perfect system models, state-estimators and a Markovian hidden system. Recurrent neural networks (RNNs) offer a natural framework for dealing with policy learning using hidden state and require only few limiting assumptions. As they can be trained well using gradient descent, they are suited for policy gradient approaches. In this paper, we present a policy gradient method, the Recurrent Policy Gradient which constitutes a model-free reinforcement learning method. It is aimed at training limited-memory stochastic policies on problems which require long-term memories of past observations. The approach involves approximating a policy gradient for a recurrent neural network by backpropagating return-weighted characteristic eligibilities through time. Using a ‘‘Long Short-Term Memory'' RNN architecture, we are able to outperform previous RL methods on three important benchmark tasks. Furthermore, we show that using history-dependent baselines helps reducing estimation variance significantly, thus enabling our approach to tackle more challenging, highly stochastic environment

    DoShiCo Challenge: Domain Shift in Control Prediction

    Full text link
    Training deep neural network policies end-to-end for real-world applications so far requires big demonstration datasets in the real world or big sets consisting of a large variety of realistic and closely related 3D CAD models. These real or virtual data should, moreover, have very similar characteristics to the conditions expected at test time. These stringent requirements and the time consuming data collection processes that they entail, are currently the most important impediment that keeps deep reinforcement learning from being deployed in real-world applications. Therefore, in this work we advocate an alternative approach, where instead of avoiding any domain shift by carefully selecting the training data, the goal is to learn a policy that can cope with it. To this end, we propose the DoShiCo challenge: to train a model in very basic synthetic environments, far from realistic, in a way that it can be applied in more realistic environments as well as take the control decisions on real-world data. In particular, we focus on the task of collision avoidance for drones. We created a set of simulated environments that can be used as benchmark and implemented a baseline method, exploiting depth prediction as an auxiliary task to help overcome the domain shift. Even though the policy is trained in very basic environments, it can learn to fly without collisions in a very different realistic simulated environment. Of course several benchmarks for reinforcement learning already exist - but they never include a large domain shift. On the other hand, several benchmarks in computer vision focus on the domain shift, but they take the form of a static datasets instead of simulated environments. In this work we claim that it is crucial to take the two challenges together in one benchmark.Comment: Published at SIMPAR 2018. Please visit the paper webpage for more information, a movie and code for reproducing results: https://kkelchte.github.io/doshic

    Beyond Monte Carlo Tree Search: Playing Go with Deep Alternative Neural Network and Long-Term Evaluation

    Full text link
    Monte Carlo tree search (MCTS) is extremely popular in computer Go which determines each action by enormous simulations in a broad and deep search tree. However, human experts select most actions by pattern analysis and careful evaluation rather than brute search of millions of future nteractions. In this paper, we propose a computer Go system that follows experts way of thinking and playing. Our system consists of two parts. The first part is a novel deep alternative neural network (DANN) used to generate candidates of next move. Compared with existing deep convolutional neural network (DCNN), DANN inserts recurrent layer after each convolutional layer and stacks them in an alternative manner. We show such setting can preserve more contexts of local features and its evolutions which are beneficial for move prediction. The second part is a long-term evaluation (LTE) module used to provide a reliable evaluation of candidates rather than a single probability from move predictor. This is consistent with human experts nature of playing since they can foresee tens of steps to give an accurate estimation of candidates. In our system, for each candidate, LTE calculates a cumulative reward after several future interactions when local variations are settled. Combining criteria from the two parts, our system determines the optimal choice of next move. For more comprehensive experiments, we introduce a new professional Go dataset (PGD), consisting of 253233 professional records. Experiments on GoGoD and PGD datasets show the DANN can substantially improve performance of move prediction over pure DCNN. When combining LTE, our system outperforms most relevant approaches and open engines based on MCTS.Comment: AAAI 201

    Imitating Driver Behavior with Generative Adversarial Networks

    Full text link
    The ability to accurately predict and simulate human driving behavior is critical for the development of intelligent transportation systems. Traditional modeling methods have employed simple parametric models and behavioral cloning. This paper adopts a method for overcoming the problem of cascading errors inherent in prior approaches, resulting in realistic behavior that is robust to trajectory perturbations. We extend Generative Adversarial Imitation Learning to the training of recurrent policies, and we demonstrate that our model outperforms rule-based controllers and maximum likelihood models in realistic highway simulations. Our model both reproduces emergent behavior of human drivers, such as lane change rate, while maintaining realistic control over long time horizons.Comment: 8 pages, 6 figure

    Recurrent Models of Visual Attention

    Full text link
    Applying convolutional neural networks to large images is computationally expensive because the amount of computation scales linearly with the number of image pixels. We present a novel recurrent neural network model that is capable of extracting information from an image or video by adaptively selecting a sequence of regions or locations and only processing the selected regions at high resolution. Like convolutional neural networks, the proposed model has a degree of translation invariance built-in, but the amount of computation it performs can be controlled independently of the input image size. While the model is non-differentiable, it can be trained using reinforcement learning methods to learn task-specific policies. We evaluate our model on several image classification tasks, where it significantly outperforms a convolutional neural network baseline on cluttered images, and on a dynamic visual control problem, where it learns to track a simple object without an explicit training signal for doing so
    • …
    corecore