Search CORE

69 research outputs found

Q-learning for Robots

Author: Touzet Claude
Publication venue: 'MIT Press - Journals'
Publication date: 01/01/2003
Field of study

International audienceRobot learning is a challenging – and somewhat unique – research domain. If a robot behavior is defined as a mapping between situations that occurred in the real world and actions to be accomplished, then the supervised learning of a robot behavior requires a set of representative examples (situation, desired action). In order to be able to gather such learning base, the human operator must have a deep understanding of the robot-world interaction (i.e., a model). But, there are many application domains where such models cannot be obtained, either because detailed knowledge of the robot’s world is unavailable (e.g., spatial or underwater exploration, nuclear or toxic waste management), or because it would be to costly. In this context, the automatic synthesis of a representative learning base is an important issue. It can be sought using reinforcement learning techniques – in particular Q-learning which does not require a model of the robot-world interaction. Compared to supervised learning, Q-learning examples are triplets (situation, action, Q value), where the Q value is the utility of executing the action in the situation. The supervised learning base is obtained by recruiting the triplets with the highest utility

HAL AMU

HAL Descartes

Hal-Diderot

Building Internal Maps of a Mobile Robot

Author: Andrej Dobnikar
Branko Ster
Publication venue: 'IntechOpen'
Publication date: 01/06/2008
Field of study

IntechOpen

Crossref

Recommended from our members

Automatic tuning of the reinforcement function

Author: Santos Juan Miguel
Touzet Claude
Publication venue: Oak Ridge National Laboratory
Publication date: 31/12/1997
Field of study

The aim of this work is to present a method that helps tuning the reinforcement function parameters in a reinforcement learning approach. Since the proposal of neural based implementations for the reinforcement learning paradigm (which reduced learning time and memory requirements to realistic values) reinforcement functions have become the critical components. Using a general definition for reinforcement functions, the authors solve, in a particular case, the so called exploration versus exploitation dilemma through the careful computation of the RF parameter values. They propose an algorithm to compute, during the exploration part of the learning phase, an estimate for the parameter values. Experiments with the mobile robot Nomad 200 validate their proposals

UNT Digital Library

Recommended from our members

Decision-making in brains and robots - the case for an interdisciplinary approach

Author: Lee Sang Wan
Seymour Benjamin
Publication venue: Current Opinion in Behavioral Sciences
Publication date: 01/01/2019
Field of study

Reinforcement Learning describes a general method for trial-and-error learning, and has emerged as a dominant framework both for optimal control in autonomous robots, and understanding decision-making in the brain. Despite their common roots, however, these two fields have evolved largely independently. In this perspective we consider how each now face problems that could potentially be addressed by insights from the other, and argue that an interdisciplinary approach could greatly accelerate progress in both

Apollo (Cambridge)

Modular reinforcement learning : a case study in a robot domain

Author: Kalmár Zsolt
Lőrincz András
Szepesvári Csaba
Publication venue
Publication date: 01/01/2000
Field of study

The behaviour of reinforcement learning (RL) algorithms is best understood in completely observable, finite state- and action-space, discrete-time controlled Markov-chains. Robot-learning domains, on the other hand, are inherently infinite both in time and space, and moreover they are only partially observable. In this article we suggest a systematic design method whose motivation comes from the desire to transform the task-to-be-solved into a finite-state, discrete-time, "approximately" Markovian task, which is completely observable, too. The key idea is to break up the problem into subtasks and design controllers for each of the subtasks. Then operating conditions are attached to the controllers (together the controllers and their operating conditions which are called modules) and possible additional features are designed to facilitate observability. A new discrete time-counter is introduced at the "module-level" that clicks only when a change in the value of one of the features is observed. The approach was tried out on a real-life robot. Several RL algorithms were compared and it was found that a model-based approach worked best. The learnt switching strategy performed equally well as a handcrafted version. Moreover, the learnt strategy seemed to exploit certain properties of the environment which could not have been seen in advance, which predicted the promising possibility that a learnt controller might overperform a handcrafted switching strategy in the future

University of Szeged

The Allocation of Time and Location Information to Activity-Travel Sequence Data by Means of Reinforcement Learning

Author: Wets Janssens
Publication venue: 'IntechOpen'
Publication date: 01/01/2008
Field of study

IntechOpen