Search CORE

199 research outputs found

Improving the energy efficiency of autonomous underwater vehicles by learning to model disturbances

Author: Caldwell DG
Kormushev P
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/12/2013
Field of study

Energy efficiency is one of the main challenges for long-term autonomy of AUVs (Autonomous Underwater Vehicles). We propose a novel approach for improving the energy efficiency of AUV controllers based on the ability to learn which external disturbances can safely be ignored. The proposed learning approach uses adaptive oscillators that are able to learn online the frequency, amplitude and phase of zero-mean periodic external disturbances. Such disturbances occur naturally in open water due to waves, currents, and gravity, but also can be caused by the dynamics and hydrodynamics of the AUV itself. We formulate the theoretical basis of the approach, and demonstrate its abilities on a number of input signals. Further experimental evaluation is conducted using a dynamic model of the Girona 500 AUV in simulation on two important underwater scenarios: hovering and trajectory tracking. The proposed approach shows significant energy-saving capabilities while at the same time maintaining high controller gains. The approach is generic and applicable not only for AUV control, but also for other type of control where periodic disturbances exist and could be accounted for by the controller. © 2013 IEEE

Crossref

Spiral - Imperial College Digital Repository

Encoderless position control of a two-link robot manipulator

Author: Caldwell DG
Demiris Y
Kormushev P
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 30/01/2015
Field of study

Crossref

Spiral - Imperial College Digital Repository

Eligibility Propagation to Speed up Time Hopping for Reinforcement Learning

Author: Dong Fangyan
Hirota Kaoru
Kormushev Petar
Nomoto Kohei
Publication venue
Publication date: 03/04/2009
Field of study

A mechanism called Eligibility Propagation is proposed to speed up the Time Hopping technique used for faster Reinforcement Learning in simulations. Eligibility Propagation provides for Time Hopping similar abilities to what eligibility traces provide for conventional Reinforcement Learning. It propagates values from one state to all of its temporal predecessors using a state transitions graph. Experiments on a simulated biped crawling robot confirm that Eligibility Propagation accelerates the learning process more than 3 times.Comment: 7 page

arXiv.org e-Print Archive

Crossref

Spiral - Imperial College Digital Repository

Simultaneous discovery of multiple alternative optimal policies by reinforcement learning

Author: Caldwell DG
Kormushev P
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 06/09/2012
Field of study

Conventional reinforcement learning algorithms for direct policy search are limited to finding only a single optimal policy. This is caused by their local-search nature, which allows them to converge only to a single local optimum in policy space, and makes them heavily dependent on the policy initialization. In this paper, we propose a novel reinforcement learning algorithm for direct policy search, which is capable of simultaneously finding multiple alternative optimal policies. The algorithm is based on particle filtering and performs global search in policy space, therefore eliminating the dependency on the policy initialization, and having the ability to find the globally optimal policy. We validate the approach on one-and two-dimensional problems with multiple optima, and compare its performance to a global random sampling method, and a state-of-the-art Expectation-Maximization based reinforcement learning algorithm. © 2012 IEEE

Crossref

Spiral - Imperial College Digital Repository

Towards improved AUV control through learning of periodic signals

Author: Caldwell DG
Kormushev P
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 23/09/2013
Field of study

Designing a high-performance controller for an Autonomous Underwater Vehicle (AUV) is a challenging task. There are often numerous requirements, sometimes contradicting, such as speed, precision, robustness, and energy-efficiency. In this paper, we propose a theoretical concept for improving the performance of AUV controllers based on the ability to learn periodic signals. The proposed learning approach is based on adaptive oscillators that are able to learn online the frequency, amplitude and phase of zero-mean periodic signals. Such signals occur naturally in open water due to waves, currents, and gravity, but can also be caused by the dynamics and hydrodynamics of the AUV itself. We formulate the theoretical basis of the approach, and demonstrate its abilities on synthetic input signals. Further evaluation is conducted in simulation with a dynamic model of the Girona 500 AUV on a hovering task

Spiral - Imperial College Digital Repository

Direct policy search reinforcement learning based on particle filtering

Author: Caldwell DG
Kormushev P
Publication venue
Publication date: 30/06/2012
Field of study

We reveal a link between particle filtering methods and direct policy search reinforcement learning, and propose a novel reinforcement learning algorithm, based heavily on ideas borrowed from particle filters. A major advantage of the proposed algorithm is its ability to perform global search in policy space and thus find the globally optimal policy. We validate the approach on one- and two-dimensional problems with multiple optima, and compare its performance to a global random sampling method, and a state-of-the-art ExpectationMaximization based reinforcement learning algorithm

Spiral - Imperial College Digital Repository

Exploring Restart Distributions

Author: Islam Riashat
Kormushev Petar
Levdik Vitaly
Smith Christopher M.
Tavakoli Arash
Publication venue
Publication date: 01/07/2019
Field of study

We consider the generic approach of using an experience memory to help exploration by adapting a restart distribution. That is, given the capacity to reset the state with those corresponding to the agent's past observations, we help exploration by promoting faster state-space coverage via restarting the agent from a more diverse set of initial states, as well as allowing it to restart in states associated with significant past experiences. This approach is compatible with both on-policy and off-policy methods. However, a caveat is that altering the distribution of initial states could change the optimal policies when searching within a restricted class of policies. To reduce this unsought learning bias, we evaluate our approach in deep reinforcement learning which benefits from the high representational capacity of deep neural networks. We instantiate three variants of our approach, each inspired by an idea in the context of experience replay. Using these variants, we show that performance gains can be achieved, especially in hard exploration problems.Comment: RLDM 201

arXiv.org e-Print Archive

Spiral - Imperial College Digital Repository