85,600 research outputs found

    Analysis of reinforcement learning strategies for predation in a mimic-model prey environment

    Get PDF
    In this paper we propose a mathematical learning model for a stochastic automaton simulating the behaviour of a predator operating in a random environment occupied by two types of prey: palatable mimics and unpalatable models. Specifically, a well known linear reinforcement learning algorithm is used to update the probabilities of the two actions, eat prey or ignore prey, at every random encounter. Each action elicits a probabilistic response from the environment that can be either favorable or unfavourable. We analyse both fixed and varying stochastic responses for the system. The basic approach of mimicry is defined and a short review of relevant previous approaches in the literature is given. Finally, the conditions for continuous predator performance improvement are explicitly formulated and precise definitions of predatory efficiency and mimicry efficiency are also provided

    Automating Vehicles by Deep Reinforcement Learning using Task Separation with Hill Climbing

    Full text link
    Within the context of autonomous driving a model-based reinforcement learning algorithm is proposed for the design of neural network-parameterized controllers. Classical model-based control methods, which include sampling- and lattice-based algorithms and model predictive control, suffer from the trade-off between model complexity and computational burden required for the online solution of expensive optimization or search problems at every short sampling time. To circumvent this trade-off, a 2-step procedure is motivated: first learning of a controller during offline training based on an arbitrarily complicated mathematical system model, before online fast feedforward evaluation of the trained controller. The contribution of this paper is the proposition of a simple gradient-free and model-based algorithm for deep reinforcement learning using task separation with hill climbing (TSHC). In particular, (i) simultaneous training on separate deterministic tasks with the purpose of encoding many motion primitives in a neural network, and (ii) the employment of maximally sparse rewards in combination with virtual velocity constraints (VVCs) in setpoint proximity are advocated.Comment: 10 pages, 6 figures, 1 tabl

    Deep Predictive Models for Collision Risk Assessment in Autonomous Driving

    Full text link
    In this paper, we investigate a predictive approach for collision risk assessment in autonomous and assisted driving. A deep predictive model is trained to anticipate imminent accidents from traditional video streams. In particular, the model learns to identify cues in RGB images that are predictive of hazardous upcoming situations. In contrast to previous work, our approach incorporates (a) temporal information during decision making, (b) multi-modal information about the environment, as well as the proprioceptive state and steering actions of the controlled vehicle, and (c) information about the uncertainty inherent to the task. To this end, we discuss Deep Predictive Models and present an implementation using a Bayesian Convolutional LSTM. Experiments in a simple simulation environment show that the approach can learn to predict impending accidents with reasonable accuracy, especially when multiple cameras are used as input sources.Comment: 8 pages, 4 figure

    Feedback learning particle swarm optimization

    Get PDF
    This is the authorā€™s version of a work that was accepted for publication in Applied Soft Computing. Changes resulting from the publishing process, such as peer review, editing, corrections, structural formatting, and other quality control mechanisms may not be reflected in this document. Changes may have been made to this work since it was submitted for publication. A definitive version was subsequently published and is available at the link below - Copyright @ Elsevier 2011In this paper, a feedback learning particle swarm optimization algorithm with quadratic inertia weight (FLPSO-QIW) is developed to solve optimization problems. The proposed FLPSO-QIW consists of four steps. Firstly, the inertia weight is calculated by a designed quadratic function instead of conventional linearly decreasing function. Secondly, acceleration coefficients are determined not only by the generation number but also by the search environment described by each particleā€™s history best fitness information. Thirdly, the feedback fitness information of each particle is used to automatically design the learning probabilities. Fourthly, an elite stochastic learning (ELS) method is used to refine the solution. The FLPSO-QIW has been comprehensively evaluated on 18 unimodal, multimodal and composite benchmark functions with or without rotation. Compared with various state-of-the-art PSO algorithms, the performance of FLPSO-QIW is promising and competitive. The effects of parameter adaptation, parameter sensitivity and proposed mechanism are discussed in detail.This research was partially supported by the National Natural Science Foundation of PR China (Grant No 60874113), the Research Fund for the Doctoral Program of Higher Education (Grant No 200802550007), the Key Creative Project of Shanghai Education Community (Grant No 09ZZ66), the Key Foundation Project of Shanghai(Grant No 09JC1400700), the International Science and Technology Cooperation Project of China under Grant 2009DFA32050, and the Alexander von Humboldt Foundation of Germany
    • ā€¦
    corecore