Search CORE

1,299 research outputs found

Evolution of Neural Networks for Helicopter Control: Why Modularity Matters

Author: De Nardi Renzo
Holland Owen
Lucas Simon M.
Togelius Julian
Publication venue: IEEE Press
Publication date: 01/01/2006
Field of study

The problem of the automatic development of controllers for vehicles for which the exact characteristics are not known is considered in the context of miniature helicopter flocking. A methodology is proposed in which neural network based controllers are evolved in a simulation using a dynamic model qualitatively similar to the physical helicopter. Several network architectures and evolutionary sequences are investigated, and two approaches are found that can evolve very competitive controllers. The division of the neural network into modules and of the task into incremental steps seems to be a precondition for success, and we analyse why this might be so

CiteSeerX

CogPrints Cognitive Sciences Eprint Archive

UltraSwarm: A Further Step Towards a Flock of Miniature Helicopters

Author: De nardi Renzo
Holland Owen
Publication venue: Springer
Publication date: 01/01/2006
Field of study

We describe further progress towards the development of a MAV (micro aerial vehicle) designed as an enabling tool to investigate aerial flocking. Our research focuses on the use of low cost off the shelf vehicles and sensors to enable fast prototyping and to reduce development costs. Details on the design of the embedded electronics and the modification of the chosen toy helicopter are presented, and the technique used for state estimation is described. The fusion of inertial data through an unscented Kalman filter is used to estimate the helicopter’s state, and this forms the main input to the control system. Since no detailed dynamic model of the helicopter in use is available, a method is proposed for automated system identification, and for subsequent controller design based on artificial evolution. Preliminary results obtained with a dynamic simulator of a helicopter are reported, along with some encouraging results for tackling the problem of flocking

CogPrints Cognitive Sciences Eprint Archive

SwarMAV: A Swarm of Miniature Aerial Vehicles

Author: De Nardi Renzo
Holland Owen
Publication venue
Publication date: 01/01/2006
Field of study

As the MAV (Micro or Miniature Aerial Vehicles) field matures, we expect to see that the platform's degree of autonomy, the information exchange, and the coordination with other manned and unmanned actors, will become at least as crucial as its aerodynamic design. The project described in this paper explores some aspects of a particularly exciting possible avenue of development: an autonomous swarm of MAVs which exploits its inherent reliability (through redundancy), and its ability to exchange information among the members, in order to cope with a dynamically changing environment and achieve its mission. We describe the successful realization of a prototype experimental platform weighing only 75g, and outline a strategy for the automatic design of a suitable controller

CogPrints Cognitive Sciences Eprint Archive

Neuroevolutionary reinforcement learning for generalized control of simulated helicopters

Author: Koppejan R.
Whiteson S.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2011
Field of study

International Migration, Integration and Social Cohesion online publications

Black-Box Data-efficient Policy Search for Robotics

Author: Chatzilygeroudis Konstantinos
Goepp Dorian
Kaushik Rituraj
Mouret Jean-Baptiste
Rama Roberto
Vassiliades Vassilis
Publication venue
Publication date: 22/07/2017
Field of study

The most data-efficient algorithms for reinforcement learning (RL) in robotics are based on uncertain dynamical models: after each episode, they first learn a dynamical model of the robot, then they use an optimization algorithm to find a policy that maximizes the expected return given the model and its uncertainties. It is often believed that this optimization can be tractable only if analytical, gradient-based algorithms are used; however, these algorithms require using specific families of reward functions and policies, which greatly limits the flexibility of the overall approach. In this paper, we introduce a novel model-based RL algorithm, called Black-DROPS (Black-box Data-efficient RObot Policy Search) that: (1) does not impose any constraint on the reward function or the policy (they are treated as black-boxes), (2) is as data-efficient as the state-of-the-art algorithm for data-efficient RL in robotics, and (3) is as fast (or faster) than analytical approaches when several cores are available. The key idea is to replace the gradient-based optimization algorithm with a parallel, black-box algorithm that takes into account the model uncertainties. We demonstrate the performance of our new algorithm on two standard control benchmark problems (in simulation) and a low-cost robotic manipulator (with a real robot).Comment: Accepted at the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 2017; Code at http://github.com/resibots/blackdrops; Video at http://youtu.be/kTEyYiIFGP

arXiv.org e-Print Archive

Crossref

INRIA a CCSD electronic archive server

Probabilistic policy reuse for safe reinforcement learning

Author: Fernández Rebollo Fernando
García Polo Francisco Javier
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2019
Field of study

This work introducesPolicy Reuse for Safe Reinforcement Learning, an algorithm that combines ProbabilisticPolicy Reuse and teacher advice for safe exploration in dangerous and continuous state and action reinforce-ment learning problems in which the dynamic behavior is reasonably smooth and the space is Euclidean. Thealgorithm uses a continuously increasing monotonic risk function that allows for the identification of theprobability to end up in failure from a given state. Such a risk function is defined in terms of how far such astate is from the state space known by the learning agent. Probabilistic Policy Reuse is used to safely balancethe exploitation of actual learned knowledge, the exploration of new actions, and the request of teacher advicein parts of the state space considered dangerous. Specifically, thepi-reuse exploration strategy is used. Usingexperiments in the helicopter hover task and a business management problem, we show that thepi-reuseexploration strategy can be used to completely avoid the visit to undesirable situations while maintainingthe performance (in terms of the classical long-term accumulated reward) of the final policy achieved.This paper has been partially supported by the Spanish Ministerio de Economía y Competitividad TIN2015-65686-C5-1-R and the European Union’s Horizon 2020 Research and Innovation programme under Grant Agreement No. 730086 (ERGO). Javier García is partially supported by the Comunidad de Madrid (Spain) funds under the project 2016-T2/TIC-1712

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Universidad Carlos III de Madrid e-Archivo