2,424 research outputs found

    Combining Subgoal Graphs with Reinforcement Learning to Build a Rational Pathfinder

    Full text link
    In this paper, we present a hierarchical path planning framework called SG-RL (subgoal graphs-reinforcement learning), to plan rational paths for agents maneuvering in continuous and uncertain environments. By "rational", we mean (1) efficient path planning to eliminate first-move lags; (2) collision-free and smooth for agents with kinematic constraints satisfied. SG-RL works in a two-level manner. At the first level, SG-RL uses a geometric path-planning method, i.e., Simple Subgoal Graphs (SSG), to efficiently find optimal abstract paths, also called subgoal sequences. At the second level, SG-RL uses an RL method, i.e., Least-Squares Policy Iteration (LSPI), to learn near-optimal motion-planning policies which can generate kinematically feasible and collision-free trajectories between adjacent subgoals. The first advantage of the proposed method is that SSG can solve the limitations of sparse reward and local minima trap for RL agents; thus, LSPI can be used to generate paths in complex environments. The second advantage is that, when the environment changes slightly (i.e., unexpected obstacles appearing), SG-RL does not need to reconstruct subgoal graphs and replan subgoal sequences using SSG, since LSPI can deal with uncertainties by exploiting its generalization ability to handle changes in environments. Simulation experiments in representative scenarios demonstrate that, compared with existing methods, SG-RL can work well on large-scale maps with relatively low action-switching frequencies and shorter path lengths, and SG-RL can deal with small changes in environments. We further demonstrate that the design of reward functions and the types of training environments are important factors for learning feasible policies.Comment: 20 page

    Probabilistic inference for determining options in reinforcement learning

    Get PDF
    Tasks that require many sequential decisions or complex solutions are hard to solve using conventional reinforcement learning algorithms. Based on the semi Markov decision process setting (SMDP) and the option framework, we propose a model which aims to alleviate these concerns. Instead of learning a single monolithic policy, the agent learns a set of simpler sub-policies as well as the initiation and termination probabilities for each of those sub-policies. While existing option learning algorithms frequently require manual specification of components such as the sub-policies, we present an algorithm which infers all relevant components of the option framework from data. Furthermore, the proposed approach is based on parametric option representations and works well in combination with current policy search methods, which are particularly well suited for continuous real-world tasks. We present results on SMDPs with discrete as well as continuous state-action spaces. The results show that the presented algorithm can combine simple sub-policies to solve complex tasks and can improve learning performance on simpler tasks

    Using Relative Novelty to Identify Useful Temporal Abstractions in Reinforcement Learning

    Get PDF
    We present a new method for automatically creating useful temporal abstractions in reinforcement learning. We argue that states that allow the agent to transition to a different region of the state space are useful subgoals, and propose a method for identifying them using the concept of relative novelty. When such a state is identified, a temporallyextended activity (e.g., an option) is generated that takes the agent efficiently to this state. We illustrate the utility of the method in a number of tasks

    Learning the Structure of Continuous Markov Decision Processes

    Get PDF
    There is growing interest in artificial, intelligent agents which can operate autonomously for an extended period of time in complex environments and fulfill a variety of different tasks. Such agents will face different problems during their lifetime which may not be foreseeable at the time of their deployment. Thus, the capacity for lifelong learning of new behaviors is an essential prerequisite for this kind of agents as it enables them to deal with unforeseen situations. However, learning every complex behavior anew from scratch would be cumbersome for the agent. It is more plausible to consider behavior to be modular and let the agent acquire a set of reusable building blocks for behavior, the so-called skills. These skills might, once acquired, facilitate fast learning and adaptation of behavior to new situations. This work focuses on computational approaches for skill acquisition, namely which kind of skills shall be acquired and how to acquire them. The former is commonly denoted as skill discovery and the latter as skill learning . The main contribution of this thesis is a novel incremental skill acquisition approach which is suited for lifelong learning. In this approach, the agent learns incrementally a graph-based representation of a domain and exploits certain properties of this graph such as its bottlenecks for skill discovery. This thesis proposes a novel approach for learning a graph-based representation of continuous domains based on formalizing the problem as a probabilistic generative model. Furthermore, a new incremental agglomerative clustering approach for identifying bottlenecks of such graphs is presented. Thereupon, the thesis proposes a novel intrinsic motivation system which enables an agent to intelligently allocate time between skill discovery and skill learning in developmental settings, where the agent is not constrained by external tasks. The results of this thesis show that the resulting skill acquisition approach is suited for continuous domains and can deal with domain stochasticity and different explorative behavior of the agent. The acquired skills are reusable and versatile and can be used in multi-task and lifelong learning settings in high-dimensional problems

    Humans decompose tasks by trading off utility and computational cost

    Full text link
    Human behavior emerges from planning over elaborate decompositions of tasks into goals, subgoals, and low-level actions. How are these decompositions created and used? Here, we propose and evaluate a normative framework for task decomposition based on the simple idea that people decompose tasks to reduce the overall cost of planning while maintaining task performance. Analyzing 11,117 distinct graph-structured planning tasks, we find that our framework justifies several existing heuristics for task decomposition and makes predictions that can be distinguished from two alternative normative accounts. We report a behavioral study of task decomposition (N=806N=806) that uses 30 randomly sampled graphs, a larger and more diverse set than that of any previous behavioral study on this topic. We find that human responses are more consistent with our framework for task decomposition than alternative normative accounts and are most consistent with a heuristic -- betweenness centrality -- that is justified by our approach. Taken together, our results provide new theoretical insight into the computational principles underlying the intelligent structuring of goal-directed behavior
    • …
    corecore