Search CORE

7,328 research outputs found

Fast reinforcement learning for vision-guided mobile robots

Author: Duckett T.
Martinez-Marin T.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2005
Field of study

This paper presents a new reinforcement learning algorithm for accelerating acquisition of new skills by real mobile robots, without requiring simulation. It speeds up Q-learning by applying memory-based sweeping and enforcing the “adjoining property”, a technique that exploits the natural ordering of sensory state spaces in many robotic applications by only allowing transitions between neighbouring states. The algorithm is tested within an image-based visual servoing framework on a docking task, in which the robot has to position its gripper at a desired configuration relative to an object on a table. In experiments, we compare the performance of the new algorithm with a hand-designed linear controller and a scheme using the linear controller as a bias to further accelerate the learning. By analysis of the controllability and docking time, we show that the biased learner could improve on the performance of the linear controller, while requiring substantially lower training time than unbiased learning (less than 1 hour on the real robot)

University of Lincoln Institutional Repository

Crossref

Vision-based reinforcement learning using approximate policy iteration

Author: Duckett Tom
Shaker Marwan
Yue Shigang
Publication venue
Publication date: 01/01/2009
Field of study

A major issue for reinforcement learning (RL) applied to robotics is the time required to learn a new skill. While RL has been used to learn mobile robot control in many simulated domains, applications involving learning on real robots are still relatively rare. In this paper, the Least-Squares Policy Iteration (LSPI) reinforcement learning algorithm and a new model-based algorithm Least-Squares Policy Iteration with Prioritized Sweeping (LSPI+), are implemented on a mobile robot to acquire new skills quickly and efficiently. LSPI+ combines the benefits of LSPI and prioritized sweeping, which uses all previous experience to focus the computational effort on the most “interesting” or dynamic parts of the state space. The proposed algorithms are tested on a household vacuum cleaner robot for learning a docking task using vision as the only sensor modality. In experiments these algorithms are compared to other model-based and model-free RL algorithms. The results show that the number of trials required to learn the docking task is significantly reduced using LSPI compared to the other RL algorithms investigated, and that LSPI+ further improves on the performance of LSPI

University of Lincoln Institutional Repository

CiteSeerX

Unsupervised navigation using an economy principle

Author: McGonigle Brendan
Warnett Lawrence
Publication venue: Lund University Cognitive Studies
Publication date: 01/01/2002
Field of study

We describe robot navigation learning based on self-selection of privileged vectors through the environment in accordance with an in built economy metric. This provides the opportunity both for progressive behavioural adaptation, and adaptive derivations, leading, through situated activity, to “representations" of the environment which are both economically attained and inherently meaningful to the agent

CogPrints Cognitive Sciences Eprint Archive

Learning visual docking for non-holonomic autonomous vehicles

Author: Duckett Tom
Martinez-Marin Tomas
Publication venue
Publication date: 01/06/2008
Field of study

This paper presents a new method of learning visual docking skills for non-holonomic vehicles by direct interaction with the environment. The method is based on a reinforcement algorithm, which speeds up Q-learning by applying memorybased sweeping and enforcing the “adjoining property”, a filtering mechanism to only allow transitions between states that satisfy a fixed distance. The method overcomes some limitations of reinforcement learning techniques when they are employed in applications with continuous non-linear systems, such as car-like vehicles. In particular, a good approximation to the optimal behaviour is obtained by a small look-up table. The algorithm is tested within an image-based visual servoing framework on a docking task. The training time was less than 1 hour on the real vehicle. In experiments, we show the satisfactory performance of the algorithm

University of Lincoln Institutional Repository

Crossref