Search CORE

8 research outputs found

Temporal Concatenation for Markov Decision Processes

Author: Song Ruiyang
Xu Kuang
Publication venue
Publication date: 13/06/2020
Field of study

We propose and analyze a Temporal Concatenation (TC) heuristic for solving large-scale finite-horizon Markov decision processes (MDP). The Temporal Concatenation divides a finite-horizon MDP into smaller sub-problems along the time horizon, and generates an overall solution by simply concatenating the optimal solutions from these sub-problems. As a "black box" architecture, Temporal Concatenation works with a wide range of existing MDP algorithms with the potential of substantial speed-up at the expense of minor performance degradation. Our main results characterize the regret of Temporal Concatenation, defined as the gap between the expected rewards from Temporal Concatenation's solution and that from the optimal solution. We provide upper bounds that show, when the underlying MDP satisfies a bounded-diameter criterion, the regret of Temporal Concatenation is bounded by a constant independent of the length of the horizon. Conversely, we provide matching lower bounds that demonstrate that, for any finite diameter, there exist MDP instances for which the regret upper bound is tight. We further contextualize the theoretical results in an illustrative example of dynamic energy management with storage, and provide simulation results to assess Temporal Concatenation's average-case regret within a family of MDPs related to graph traversal.Comment: Added references and updated the theoretical result in Section

arXiv.org e-Print Archive

Reverse Iterative Deepening for Finite-Horizon MDPs with Large Branching Factors

Author: Dai Peng
Kolobov Andrey
Mausam Mausam
Weld Daniel
Publication venue: Association for the Advancement of Artificial Intelligence
Publication date: 14/05/2012
Field of study

In contrast to previous competitions, where the problems were goal-based, the 2011 International Probabilistic Planning Competition (IPPC-2011) emphasized finite-horizon reward maximization problems with large branching factors. These MDPs modeled more realistic planning scenarios and presented challenges to the previous state-of-the-art planners (e.g., those from IPPC-2008), which were primarily based on domain determinization — a technique more suited to goal-oriented MDPs with small branching factors. Moreover, large branching factors render the existing implementations of RTDP- and LAO-style algorithms inefficient as well. In this paper we present GLUTTON, our planner at IPPC-2011 that performed well on these challenging MDPs. The main algorithm used by GLUTTON is LR2TDP, an LRTDP-based optimal algorithm for finite-horizon problems centered around the novel idea of reverse iterative deepening. We detail LR2TDP itself as well as a series of optimizations included in GLUTTON that help LR2TDP achieve competitive performance on difficult problems with large branching factors -- subsampling the transition function, separating out natural dynamics, caching transition function samples, and others. Experiments show that GLUTTON and PROST, the IPPC-2011 winner, have complementary strengths, with GLUTTON demonstrating superior performance on problems with few high-reward terminal states

Association for the Advancement of Artificial Intelligence: AAAI Publications

Relational reinforcement learning for planning with exogenous effects

Author: Alenyà Ribas Guillem
Inoue Katsumi
Martinez Martinez David
Ribeiro Tony
Torras Carme
Publication venue
Publication date: 01/01/2017
Field of study

Probabilistic planners have improved recently to the point that they can solve difficult tasks with complex and expressive models. In contrast, learners cannot tackle yet the expressive models that planners do, which forces complex models to be mostly handcrafted. We propose a new learning approach that can learn relational probabilistic models with both action effects and exogenous effects. The proposed learning approach combines a multi-valued variant of inductive logic programming for the generation of candidate models, with an optimization method to select the best set of planning operators to model a problem. We also show how to combine this learner with reinforcement learning algorithms to solve complete problems. Finally, experimental validation is provided that shows improvements over previous work in both simulation and a robotic task. The robotic task involves a dynamic scenario with several agents where a manipulator robot has to clear the tableware on a table. We show that the exogenous effects learned by our approach allowed the robot to clear the table in a more efficient way.Peer ReviewedPostprint (published version

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

HAL Descartes

Digital.CSIC

Learning relational models with human interaction for planning in robotics

Author: Martínez Martínez David
Publication venue: Universitat Politècnica de Catalunya
Publication date: 01/01/2017
Field of study

Automated planning has proven to be useful to solve problems where an agent has to maximize a reward function by executing actions. As planners have been improved to salve more expressive and difficult problems, there is an increasing interest in using planning to improve efficiency in robotic tasks. However, planners rely on a domain model, which has to be either handcrafted or learned. Although learning domain models can be very costly, recent approaches provide generalization capabilities and integrate human feedback to reduce the amount of experiences required to learn. In this thesis we propase new methods that allow an agent with no previous knowledge to solve certain problems more efficiently by using task planning. First, we show how to apply probabilistic planning to improve robot performance in manipulation tasks (such as cleaning the dirt or clearing the tableware on a table). Planners obtain sequences of actions that get the best result in the long term, beating reactive strategies. Second, we introduce new reinforcement learning algorithms where the agent can actively request demonstrations from a teacher to learn new actions and speed up the learning process. In particular, we propase an algorithm that allows the user to set the mínimum quality to be achieved, where a better quality also implies that a larger number of demonstrations will be requested . Moreover, the learned model is analyzed to extract the unlearned or problematic parts of the model. This information allow the agent to provide guidance to the teacher when a demonstration is requested, and to avoid irrecoverable errors. Finally, a new domain model learner is introduced that, in addition to relational probabilistic action models, can also learn exogenous effects. This learner can be integrated with existing planners and reinforcement learning algorithms to salve a wide range of problems. In summary, we improve the use of learning and task planning to salve unknown tasks. The improvements allow an agent to obtain a larger benefit from planners, learn faster, balance the number of action executions and teacher demonstrations, avoid irrecoverable errors, interact with a teacher to solve difficult problems, and adapt to the behavior of other agents by learning their dynamics. All the proposed methods were compared with state-of-the-art approaches, and were also demonstrated in different scenarios, including challenging robotic tasks.La planificación automática ha probado ser de gran utilidad para resolver problemas en los que un agente tiene que ejecutar acciones para maximizar una función de recompensa. A medida que los planificadores han sido capaces de resolver problemas cada vez más complejos, ha habido un creciente interés por utilizar dichos planificadores para mejorar la eficiencia de tareas robóticas. Sin embargo, los planificadores requieren un modelo del dominio, el cual puede ser creado a mano o aprendido. Aunque aprender modelos automáticamente puede ser costoso, recientemente han aparecido métodos que permiten la interacción persona-máquina y generalizan el conocimiento para reducir la cantidad de experiencias requeridas para aprender. En esta tesis proponemos nuevos métodos que permiten a un agente sin conocimiento previo de la tarea resolver problemas de forma más eficiente mediante el uso de planificación automática. Comenzaremos mostrando cómo aplicar planificación probabilística para mejorar la eficiencia de robots en tareas de manipulación (como limpiar suciedad o recoger una mesa). Los planificadores son capaces de obtener las secuencias de acciones que producen los mejores resultados a largo plazo, superando a las estrategias reactivas. Por otro lado, presentamos nuevos algoritmos de aprendizaje por refuerzo en los que el agente puede solicitar demostraciones a un profesor. Dichas demostraciones permiten al agente acelerar el aprendizaje o aprender nuevas acciones. En particular, proponemos un algoritmo que permite al usuario establecer la mínima suma de recompensas que es aceptable obtener, donde una recompensa más alta implica que se requerirán más demostraciones. Además, el modelo aprendido será analizado para identificar qué partes están incompletas o son problemáticas. Esta información permitirá al agente evitar errores irrecuperables y también guiar al profesor cuando se solicite una demostración. Finalmente, se ha introducido un nuevo método de aprendizaje para modelos de dominios que, además de obtener modelos relacionales de acciones probabilísticas, también puede aprender efectos exógenos. Mostraremos cómo integrar este método en algoritmos de aprendizaje por refuerzo para poder abordar una mayor cantidad de problemas. En resumen, hemos mejorado el uso de técnicas de aprendizaje y planificación para resolver tareas desconocidas a priori. Estas mejoras permiten a un agente aprovechar mejor los planificadores, aprender más rápido, elegir entre reducir el número de acciones ejecutadas o el número de demostraciones solicitadas, evitar errores irrecuperables, interactuar con un profesor para resolver problemas complejos, y adaptarse al comportamiento de otros agentes aprendiendo sus dinámicas. Todos los métodos propuestos han sido comparados con trabajos del estado del arte, y han sido evaluados en distintos escenarios, incluyendo tareas robóticas

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Tesis Doctorals en Xarxa

Adapting robot behavior to user preferences in assistive scenarios

Author: Canal Camprodon Gerard
Publication venue
Publication date: 09/03/2020
Field of study

Robotic assistants have inspired numerous books and science fiction movies. In the real world, these kinds of devices are a growing need in amongst the elderly, who while life continue requiring more assistance. While life expectancy is increasing, life quality is not necessarily doing so. Thus, we may find ourselves and our loved ones being dependent and needing another person to perform the most basic tasks, which has a strong psychological impact. Accordingly, assistive robots may be the definitive tool to give more quality of life by empowering dependent people and extending their independent living. Assisting users to perform daily activities requires adapting to them and their needs, as they might not be able to adapt to the robot. This thesis tackles adaptation and personalization issues through user preferences. We 'focus on physical tasks that involve close contact, as these present interesting challenges, and are of great importance for he user. Therefore, three tasks are mainly used throughout the thesis: assistive feeding, shoe fitting, and jacket dressing. We first describe a framework for robot behavior adaptation that illustrates how robots should be personalized for and by end- users or their assistants. Using this framework, non-technical users determine how !he robot should behave. Then, we define the concept of preference for assistive robotics scenarios and establish a taxonomy, which includes hierarchies and groups of preferences, grounding definitions and concepts. We then show how the preferences in the taxonomy are used with Al planning systems to adapt the robot behavior to the preferences of the user obtained from simple questions. Our algorithms allow for long-term adaptations as well as to cope with misinformed user models. We further integrate the methods with low-level motion primitives that provide a more robust adaptation and behavior while lowering the number of needed actions and demonstrations. Moreover, we perform a deeper analysis in Planning and preferences with the introduction of new algorithms to provide preference suggestions in planning domains. The thesis then concludes with a user study that evaluates the use of the preferences in the three real assistive robotics scenarios. The experiments show a clear understanding of the preferences of users, who were able to assess the impact of their preferences on the behavior of the robot. In summary, we provide tools and algorithms to design the robotic assistants of the future. Assistants that should be able to adapt to the assisted user needs and preferences, just as human assistants do nowadays.Els assistents robòtics han inspirat nombrosos llibres i pel·lícules de ciència-ficció al llarg de la història. Però tornant al món real, aquest tipus de dispositius s'estan tornant una necessitat per a una societat que envelleix a un ritme ràpid i que, per tant, requerirà més i més assistència. Mentre l'esperança de vida augmenta, la qualitat de vida no necessàriament ho fa. Per tant, ens podem trobar a nosaltres mateixos i als nostres estimats en una situació de dependència, necessitant una altra persona per poder fer les tasques més bàsiques, cosa que té un gran impacte psicològic. En conseqüència, els robots assistencials poden ser l'eina definitiva per proporcionar una millor qualitat de vida empoderant els usuaris i allargant la seva capacitat de viure independentment. L'assistència a persones per realitzar tasques diàries requereix adaptar-se a elles i les seves necessitats, donat que aquests usuaris no poden adaptar-se al robot. En aquesta tesi, abordem el problema de l'adaptació i la personalització d'un robot mitjançant preferències de l'usuari. Ens centrem en tasques físiques, que involucren contacte amb la persona, per les seves dificultats i importància per a l'usuari. Per aquest motiu, la tesi utilitzarà principalment tres tasques com a exemple: donar menjar, posar una sabata i vestir una jaqueta. Comencem definint un marc (framework) per a la personalització del comportament del robot que defineix com s'han de personalitzar els robots per usuaris i pels seus assistents. Amb aquest marc, usuaris sense coneixements tècnics són capaços de definir com s'ha de comportar el robot. Posteriorment definim el concepte de preferència per a robots assistencials i establim una taxonomia que inclou jerarquies i grups de preferències, els quals fonamenten les definicions i conceptes. Després mostrem com les preferències de la taxonomia s'utilitzen amb sistemes planificadors amb IA per adaptar el comportament del robot a les preferències de l'usuari, que s'obtenen mitjançant preguntes simples. Els nostres algorismes permeten l'adaptació a llarg termini, així com fer front a models d'usuari mal inferits. Aquests mètodes són integrats amb primitives a baix nivell que proporcionen una adaptació i comportament més robusts a la mateixa vegada que disminueixen el nombre d'accions i demostracions necessàries. També fem una anàlisi més profunda de l'ús de les preferències amb planificadors amb la introducció de nous algorismes per fer suggeriments de preferències en dominis de planificació. La tesi conclou amb un estudi amb usuaris que avalua l'ús de les preferències en les tres tasques assistencials. Els experiments demostren un clar enteniment de les preferències per part dels usuaris, que van ser capaços de discernir quan les seves preferències eren utilitzades. En resum, proporcionem eines i algorismes per dissenyar els assistents robòtics del futur. Uns assistents que haurien de ser capaços d'adaptar-se a les preferències i necessitats de l'usuari que assisteixen, tal com els assistents humans fan avui en dia

UPCommons. Portal del coneixement obert de la UPC

Tesis Doctorals en Xarxa

Adapting robot behavior to user preferences in assistive scenarios

Author: Canal Camprodon Gerard
Publication venue
Publication date: 09/03/2020
Field of study

Aplicat embargament des de la data de defensa fins el 24 de juliol de 2020Robotic assistants have inspired numerous books and science fiction movies. In the real world, these kinds of devices are a growing need in amongst the elderly, who while life continue requiring more assistance. While life expectancy is increasing, life quality is not necessarily doing so. Thus, we may find ourselves and our loved ones being dependent and needing another person to perform the most basic tasks, which has a strong psychological impact. Accordingly, assistive robots may be the definitive tool to give more quality of life by empowering dependent people and extending their independent living. Assisting users to perform daily activities requires adapting to them and their needs, as they might not be able to adapt to the robot. This thesis tackles adaptation and personalization issues through user preferences. We 'focus on physical tasks that involve close contact, as these present interesting challenges, and are of great importance for he user. Therefore, three tasks are mainly used throughout the thesis: assistive feeding, shoe fitting, and jacket dressing. We first describe a framework for robot behavior adaptation that illustrates how robots should be personalized for and by end- users or their assistants. Using this framework, non-technical users determine how !he robot should behave. Then, we define the concept of preference for assistive robotics scenarios and establish a taxonomy, which includes hierarchies and groups of preferences, grounding definitions and concepts. We then show how the preferences in the taxonomy are used with Al planning systems to adapt the robot behavior to the preferences of the user obtained from simple questions. Our algorithms allow for long-term adaptations as well as to cope with misinformed user models. We further integrate the methods with low-level motion primitives that provide a more robust adaptation and behavior while lowering the number of needed actions and demonstrations. Moreover, we perform a deeper analysis in Planning and preferences with the introduction of new algorithms to provide preference suggestions in planning domains. The thesis then concludes with a user study that evaluates the use of the preferences in the three real assistive robotics scenarios. The experiments show a clear understanding of the preferences of users, who were able to assess the impact of their preferences on the behavior of the robot. In summary, we provide tools and algorithms to design the robotic assistants of the future. Assistants that should be able to adapt to the assisted user needs and preferences, just as human assistants do nowadays.Els assistents robòtics han inspirat nombrosos llibres i pel·lícules de ciència-ficció al llarg de la història. Però tornant al món real, aquest tipus de dispositius s'estan tornant una necessitat per a una societat que envelleix a un ritme ràpid i que, per tant, requerirà més i més assistència. Mentre l'esperança de vida augmenta, la qualitat de vida no necessàriament ho fa. Per tant, ens podem trobar a nosaltres mateixos i als nostres estimats en una situació de dependència, necessitant una altra persona per poder fer les tasques més bàsiques, cosa que té un gran impacte psicològic. En conseqüència, els robots assistencials poden ser l'eina definitiva per proporcionar una millor qualitat de vida empoderant els usuaris i allargant la seva capacitat de viure independentment. L'assistència a persones per realitzar tasques diàries requereix adaptar-se a elles i les seves necessitats, donat que aquests usuaris no poden adaptar-se al robot. En aquesta tesi, abordem el problema de l'adaptació i la personalització d'un robot mitjançant preferències de l'usuari. Ens centrem en tasques físiques, que involucren contacte amb la persona, per les seves dificultats i importància per a l'usuari. Per aquest motiu, la tesi utilitzarà principalment tres tasques com a exemple: donar menjar, posar una sabata i vestir una jaqueta. Comencem definint un marc (framework) per a la personalització del comportament del robot que defineix com s'han de personalitzar els robots per usuaris i pels seus assistents. Amb aquest marc, usuaris sense coneixements tècnics són capaços de definir com s'ha de comportar el robot. Posteriorment definim el concepte de preferència per a robots assistencials i establim una taxonomia que inclou jerarquies i grups de preferències, els quals fonamenten les definicions i conceptes. Després mostrem com les preferències de la taxonomia s'utilitzen amb sistemes planificadors amb IA per adaptar el comportament del robot a les preferències de l'usuari, que s'obtenen mitjançant preguntes simples. Els nostres algorismes permeten l'adaptació a llarg termini, així com fer front a models d'usuari mal inferits. Aquests mètodes són integrats amb primitives a baix nivell que proporcionen una adaptació i comportament més robusts a la mateixa vegada que disminueixen el nombre d'accions i demostracions necessàries. També fem una anàlisi més profunda de l'ús de les preferències amb planificadors amb la introducció de nous algorismes per fer suggeriments de preferències en dominis de planificació. La tesi conclou amb un estudi amb usuaris que avalua l'ús de les preferències en les tres tasques assistencials. Els experiments demostren un clar enteniment de les preferències per part dels usuaris, que van ser capaços de discernir quan les seves preferències eren utilitzades. En resum, proporcionem eines i algorismes per dissenyar els assistents robòtics del futur. Uns assistents que haurien de ser capaços d'adaptar-se a les preferències i necessitats de l'usuari que assisteixen, tal com els assistents humans fan avui en dia.Postprint (published version

UPCommons. Portal del coneixement obert de la UPC

Hybrid Mission Planning with Coalition Formation

Author: Dukeman Anton Leo
Publication venue: VANDERBILT
Publication date
Field of study