Search CORE

107 research outputs found

Hierarchical Imitation Learning with Vector Quantized Models

Author: Ilin Alexander
Kujanpää Kalle
Pajarinen Joni
Publication venue
Publication date: 29/05/2023
Field of study

The ability to plan actions on multiple levels of abstraction enables intelligent agents to solve complex tasks effectively. However, learning the models for both low and high-level planning from demonstrations has proven challenging, especially with higher-dimensional inputs. To address this issue, we propose to use reinforcement learning to identify subgoals in expert trajectories by associating the magnitude of the rewards with the predictability of low-level actions given the state and the chosen subgoal. We build a vector-quantized generative model for the identified subgoals to perform subgoal-level planning. In experiments, the algorithm excels at solving complex, long-horizon decision-making problems outperforming state-of-the-art. Because of its ability to plan, our algorithm can find better trajectories than the ones in the training setComment: To appear at ICML 202

arXiv.org e-Print Archive

Workshop on Rich Representations for Reinforcement Learning:Held in conjunction with the 22nd International Conference on Machine Learning, August 7, 2005, Bonn, Germany

Author: Driessens Kurt
Fern Alan
van Otterlo Martijn
Publication venue: University of Bonn
Publication date: 01/01/2005
Field of study

University of Twente Research Information

Hierarchical Reinforcement Learning in Behavior and the Brain

Author: Fernandes José Joaquim Fonseca Ribas
Publication venue: Universidade Nova de Lisboa. Instituto de Tecnologia química e Biológica.
Publication date: 01/11/2013
Field of study

Dissertation presented to obtain the Ph.D degree in Biology, NeuroscienceReinforcement learning (RL) has provided key insights to the neurobiology of learning and decision making. The pivotal nding is that the phasic activity of dopaminergic cells in the ventral tegmental area during learning conforms to a reward prediction error (RPE), as speci ed in the temporal-di erence learning algorithm (TD). This has provided insights to conditioning, the distinction between habitual and goal-directed behavior, working memory, cognitive control and error monitoring. It has also advanced the understanding of cognitive de cits in Parkinson's disease, depression, ADHD and of personality traits such as impulsivity.(...

Repositório da Universidade Nova de Lisboa

Learning and planning in videogames via task decomposition

Author: Dann M
Publication venue: RMIT University
Publication date
Field of study

Artificial intelligence (AI) methods have come a long way in tabletop games, with computer programs having now surpassed human experts in the challenging games of chess, Go and heads-up no-limit Texas hold'em. However, a significant simplifying factor in these games is that individual decisions have a relatively large impact on the state of the game. The real world, however, is granular. Human beings are continually presented with new information and are faced with making a multitude of tiny decisions every second. Viewed in these terms, feedback is often sparse, meaning that it only arrives after one has made a great number of decisions. Moreover, in many real-world problems there is a continuous range of actions to choose from, and attaining meaningful feedback from the environment often requires a strong degree of action coordination. Videogames, in which players must likewise contend with granular time scales and continuous action spaces, are in this sense a better proxy for real-world problems, and have thus become regarded by many as the new frontier in games AI. Seemingly, the way in which human players approach granular decision-making in videogames is by decomposing complex tasks into high-level subproblems, thereby allowing them to focus on the &quot;big picture&quot;. For example, in Super Mario World, human players seem to look ahead in extended steps, such as climbing a vine or jumping over a pit, rather than planning one frame at a time. Currently though, this type of reasoning does not come easily to machines, leaving many open research problems related to task decomposition. This thesis focuses on three such problems in particular: (1) The challenge of learning subgoals autonomously, so as to lessen the issue of sparse feedback. (2) The challenge of combining discrete planning techniques with extended actions whose durations and effects on the environment are uncertain. (3) The questions of when and why it is beneficial to reason over high-level continuous control variables, such as the velocity of a player-controlled ship, rather than over the most low-level actions available. We address these problems via new algorithms and novel experimental design, demonstrating empirically that our algorithms are more efficient than strong baselines that do not leverage task decomposition, and yielding insight into the types of environment where task decomposition is likely to be beneficial

RMIT Research Repository

Hybrid Search for Efficient Planning with Completeness Guarantees

Author: Ilin Alexander
Kujanpää Kalle
Pajarinen Joni
Publication venue
Publication date: 28/11/2023
Field of study

Solving complex planning problems has been a long-standing challenge in computer science. Learning-based subgoal search methods have shown promise in tackling these problems, but they often suffer from a lack of completeness guarantees, meaning that they may fail to find a solution even if one exists. In this paper, we propose an efficient approach to augment a subgoal search method to achieve completeness in discrete action spaces. Specifically, we augment the high-level search with low-level actions to execute a multi-level (hybrid) search, which we call complete subgoal search. This solution achieves the best of both worlds: the practical efficiency of high-level search and the completeness of low-level search. We apply the proposed search method to a recently proposed subgoal search algorithm and evaluate the algorithm trained on offline data on complex planning problems. We demonstrate that our complete subgoal search not only guarantees completeness but can even improve performance in terms of search expansions for instances that the high-level could solve without low-level augmentations. Our approach makes it possible to apply subgoal-level planning for systems where completeness is a critical requirement.Comment: NeurIPS 2023 Poste

arXiv.org e-Print Archive

Symbol acquisition for probabilistic high-level planning

Author: Kaelbling Leslie P
Konidaris George
Lozano-Perez Tomas
Publication venue: AAAI Press / International Joint Conferences on Artificial Intelligence
Publication date: 01/07/2015
Field of study

We introduce a framework that enables an agent to autonomously learn its own symbolic representation of a low-level, continuous environment. Propositional symbols are formalized as names for probability distributions, providing a natural means of dealing with uncertain representations and probabilistic plans. We determine the symbols that are sufficient for computing the probability with which a plan will succeed, and demonstrate the acquisition of a symbolic representation in a computer game domain.National Science Foundation (U.S.) (grant 1420927)United States. Office of Naval Research (grant N00014-14-1-0486)United States. Air Force. Office of Scientific Research (grant FA23861014135)United States. Army Research Office (grant W911NF1410433)MIT Intelligence Initiativ

DSpace@MIT

Learning Long Chain of Actions through Hierarchical Reinforcement Learning

Author: Anca Mihai
Publication venue
Publication date: 01/01/2024
Field of study

Explore Bristol Research