Search CORE

85,600 research outputs found

Analysis of reinforcement learning strategies for predation in a mimic-model prey environment

Author: Tsoularis A.
Wallace J.
Publication venue: 'Massey University'
Publication date: 01/01/2005
Field of study

In this paper we propose a mathematical learning model for a stochastic automaton simulating the behaviour of a predator operating in a random environment occupied by two types of prey: palatable mimics and unpalatable models. Specifically, a well known linear reinforcement learning algorithm is used to update the probabilities of the two actions, eat prey or ignore prey, at every random encounter. Each action elicits a probabilistic response from the environment that can be either favorable or unfavourable. We analyse both fixed and varying stochastic responses for the system. The basic approach of mimicry is defined and a short review of relevant previous approaches in the literature is given. Finally, the conditions for continuous predator performance improvement are explicitly formulated and precise definitions of predatory efficiency and mimicry efficiency are also provided

Massey Research Online

Automating Vehicles by Deep Reinforcement Learning using Task Separation with Hill Climbing

Author: A Liniger
B Paden
C Urmson
CW Anderson
D Dolgov
D Wierstra
DQ Mayne
E Frazzoli
HT Siegelmann
J Xu
P Falcone
R Tedrake
T Schouwenaars
Publication venue
Publication date: 02/08/2018
Field of study

Within the context of autonomous driving a model-based reinforcement learning algorithm is proposed for the design of neural network-parameterized controllers. Classical model-based control methods, which include sampling- and lattice-based algorithms and model predictive control, suffer from the trade-off between model complexity and computational burden required for the online solution of expensive optimization or search problems at every short sampling time. To circumvent this trade-off, a 2-step procedure is motivated: first learning of a controller during offline training based on an arbitrarily complicated mathematical system model, before online fast feedforward evaluation of the trained controller. The contribution of this paper is the proposition of a simple gradient-free and model-based algorithm for deep reinforcement learning using task separation with hill climbing (TSHC). In particular, (i) simultaneous training on separate deterministic tasks with the purpose of encoding many motion primitives in a neural network, and (ii) the employment of maximally sparse rewards in combination with virtual velocity constraints (VVCs) in setpoint proximity are advocated.Comment: 10 pages, 6 figures, 1 tabl

arXiv.org e-Print Archive

Crossref

Deep Predictive Models for Collision Risk Assessment in Autonomous Driving

Author: Amor Heni Ben
Fainekos Georgios
Strickland Mark
Publication venue
Publication date: 29/03/2018
Field of study

In this paper, we investigate a predictive approach for collision risk assessment in autonomous and assisted driving. A deep predictive model is trained to anticipate imminent accidents from traditional video streams. In particular, the model learns to identify cues in RGB images that are predictive of hazardous upcoming situations. In contrast to previous work, our approach incorporates (a) temporal information during decision making, (b) multi-modal information about the environment, as well as the proprioceptive state and steering actions of the controlled vehicle, and (c) information about the uncertainty inherent to the task. To this end, we discuss Deep Predictive Models and present an implementation using a Bayesian Convolutional LSTM. Experiments in a simple simulation environment show that the approach can learn to predict impending accidents with reasonable accuracy, especially when multiple cameras are used as input sources.Comment: 8 pages, 4 figure

arXiv.org e-Print Archive

Crossref

Feedback learning particle swarm optimization

Author: Andrews
Angeline
Bergh
Bratton
Cai
Chen
Clerc
Coello
Eberhart
Hong
Hsieh
Jian-an Fang
Kennedy
Kennedy
Kennedy
Kennedy
Kennedy
Krohling
Liang
Liang
Liu
Lovbjerg
Mendes
Ratnaweera
Salomon
Shi
Shi
Suganthan
Wang
Wang
Wolpert
Yang Tang
Yao
Zhan
Zidong Wang
Publication venue: 'Elsevier BV'
Publication date: 01/12/2011
Field of study

This is the author’s version of a work that was accepted for publication in Applied Soft Computing. Changes resulting from the publishing process, such as peer review, editing, corrections, structural formatting, and other quality control mechanisms may not be reflected in this document. Changes may have been made to this work since it was submitted for publication. A definitive version was subsequently published and is available at the link below - Copyright @ Elsevier 2011In this paper, a feedback learning particle swarm optimization algorithm with quadratic inertia weight (FLPSO-QIW) is developed to solve optimization problems. The proposed FLPSO-QIW consists of four steps. Firstly, the inertia weight is calculated by a designed quadratic function instead of conventional linearly decreasing function. Secondly, acceleration coefficients are determined not only by the generation number but also by the search environment described by each particle’s history best fitness information. Thirdly, the feedback fitness information of each particle is used to automatically design the learning probabilities. Fourthly, an elite stochastic learning (ELS) method is used to refine the solution. The FLPSO-QIW has been comprehensively evaluated on 18 unimodal, multimodal and composite benchmark functions with or without rotation. Compared with various state-of-the-art PSO algorithms, the performance of FLPSO-QIW is promising and competitive. The effects of parameter adaptation, parameter sensitivity and proposed mechanism are discussed in detail.This research was partially supported by the National Natural Science Foundation of PR China (Grant No 60874113), the Research Fund for the Doctoral Program of Higher Education (Grant No 200802550007), the Key Creative Project of Shanghai Education Community (Grant No 09ZZ66), the Key Foundation Project of Shanghai(Grant No 09JC1400700), the International Science and Technology Cooperation Project of China under Grant 2009DFA32050, and the Alexander von Humboldt Foundation of Germany

Crossref

Brunel University Research Archive