Search CORE

9 research outputs found

On-line case-based policy learning for automated planning in probabilistic environments

Author: Fernández Rebollo Fernando
García Polo Francisco Javier
Martínez Muñoz Moises
Publication venue: 'World Scientific Pub Co Pte Lt'
Publication date: 01/01/2018
Field of study

Many robotic control architectures perform a continuous cycle of sensing, reasoning and acting, where that reasoning can be carried out in a reactive or deliberative form. Reactive methods are fast and provide the robot with high interaction and response capabilities. Deliberative reasoning is particularly suitable in robotic systems because it employs some form of forward projection (reasoning in depth about goals, pre-conditions, resources and timing constraints) and provides the robot reasonable responses in situations unforeseen by the designer. However, this reasoning, typically conducted using Artificial Intelligence techniques like Automated Planning (AP), is not effective for controlling autonomous agents which operate in complex and dynamic environments. Deliberative planning, although feasible in stable situations, takes too long in unexpected or changing situations which require re-planning. Therefore, planning cannot be done on-line in many complex robotic problems, where quick responses are frequently required. In this paper, we propose an alternative approach based on case-based policy learning which integrates deliberative reasoning through AP and reactive response time through reactive planning policies. The method is based on learning planning knowledge from actual experiences to obtain a case-based policy. The contribution of this paper is two fold. First, it is shown that the learned case-based policy produces reasonable and timely responses in complex environments. Second, it is also shown how one case-based policy that solves a particular problem can be reused to solve a similar but more complex problem in a transfer learning scope.This paper has been partially supported by the Spanish Ministerio de Econom a y Competitividad TIN2015-65686-C5-1-R and the European Union's Horizon 2020 Research and Innovation programme under Grant Agreement No. 730086 (ERGO)

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Universidad Carlos III de Madrid e-Archivo

Policy gradient with value function approximation for collective multiagent planning

Author: KUMAR Akshat
LAU Hoong Chuin
NGUYEN Duc Thien
Publication venue: NIPS Foundation
Publication date: 01/12/2017
Field of study

National Research Foundation (NRF) Singapore under Corp Lab @ University scheme; Fujitsu Lt

arXiv.org e-Print Archive

Institutional Knowledge at Singapore Management University

Fitted Q-iteration in continuous action-space MDPs

Author: Antos Andras
Munos Rémi
Szepesvari Csaba
Publication venue: HAL CCSD
Publication date: 01/01/2007
Field of study

We consider continuous state, continuous action batch reinforcement learning where the goal is to learn a good policy from a sufficiently rich trajectory generated by some policy. We study a variant of fitted Q-iteration, where the greedy action selection is replaced by searching for a policy in a restricted set of candidate policies by maximizing the average action values. We provide a rigorous analysis of this algorithm, proving what we believe is the first finite-time bound for value-function based algorithms for continuous state and action problems

INRIA a CCSD electronic archive server

Reinforcement learning for collective multi-agent decision making

Author: NGUYEN Duc Thien
Publication venue: Singapore Management University
Publication date: 01/12/2018
Field of study

Institutional Knowledge at Singapore Management University

On knowledge representation and decision making under uncertainty

Author: Tabaeh Izadi Masoumeh.
Publication venue: McGill University
Publication date
Field of study

Designing systems with the ability to make optimal decisions under uncertainty is one of the goals of artificial intelligence. However, in many applications the design of optimal planners is complicated due to imprecise inputs and uncertain outputs resulting from stochastic dynamics. Partially Observable Markov Decision Processes (POMDPs) provide a rich mathematical framework to model these kinds of problems. However, the high computational demand of solution methods for POMDPs is a drawback for applying them in practice.In this thesis, we present a two-fold approach for improving the tractability of POMDP planning. First, we focus on designing good heuristics for POMDP approximation algorithms. We aim to scale up the efficiency of a class of POMDP approximations called point-based planning methods by designing a good planning space. We study the effect of three properties of reachable belief state points that may influence the performance of point-based approximation methods. Second, we investigate approaches to designing good controllers using an alternative representation of systems with partial observability called Predictive State Representation (PSR). This part of the thesis advocates the usefulness and practicality of PSRs in planning under uncertainty. We also attempt to move some useful characteristics of the PSR model, which has a predictive view of the world, to the POMDP model, which has a probabilistic view of the hidden states of the world. We propose a planning algorithm motivated by the connections between the two models

eScholarship@McGill

Interactive Learning of Probabilistic Decision Making by Service Robots with Multiple Skill Domains

Author: Schmidt-Rohr Sven R.
Publication venue: KIT-Bibliothek, Karlsruhe
Publication date: 01/01/2012
Field of study

This thesis makes a contribution to autonomous service robots, centered around two aspects. The first is modeling decision making in the face of incomplete information on top of diverse basic skills of a service robot. Second, based on such a model, it is investigated, how to transfer complex decision-making knowledge into the system. Interactive learning, naturally from both demonstrations of human teachers and in interaction with objects, yields decision-making models applicable by the robot

KITopen

Programmation dynamique avec approximation de la fonction valeur

Author: Munos Rémi
Publication venue: 'Revista Cientifica Hermes'
Publication date: 01/01/2008
Field of study

L'utilisation d'outils pour l'approximation de la fonction de valeur est essentielle pour pouvoir traiter des problèmes de prise de décisions séquentielles de grande taille. Les méthodes de programmation dynamique (PD) et d'apprentissage par renforcement (A/R) introduites aux chapitres 1 et 2 supposent que la fonction de valeur peut être représentée (mémorisée) en attribuant une valeur à chaque état (dont le nombre est supposé fini), par exemple sous la forme d'un tableau. Ces méthodes de résolution, dites exactes, permettent de déterminer la solution optimale du problème considéré (ou tout au moins de converger vers cette solution optimale). Cependant, elles ne s'appliquent souvent qu'à des problèmes jouets, car pour la plupart des applications intéressantes, le nombre d'états possibles est si grand (voire infini dans le cas d'espaces continus) qu'une représentation exacte de la fonction ne peut être parfaitement mémorisée. Il devient alors nécessaire de représenter la fonction de valeur, de manière approchée, à l'aide d'un nombre modéré de coefficients, et de redéfinir et analyser des méthodes de résolution, dites approchées pour la PD et l'A/R, afin de prendre en compte les conséquences de l'utilisation de telles approximations dans les problèmes de prise de décisions séquentielles

HAL - Lille 3

INRIA a CCSD electronic archive server

Policy-gradient methods for planning

Author: Aberdeen Douglas
Publication venue: 'MIT Press - Journals'
Publication date: 07/12/2015
Field of study

The Australian National University