9,673 research outputs found

    Model Learning for Look-ahead Exploration in Continuous Control

    Full text link
    We propose an exploration method that incorporates look-ahead search over basic learnt skills and their dynamics, and use it for reinforcement learning (RL) of manipulation policies . Our skills are multi-goal policies learned in isolation in simpler environments using existing multigoal RL formulations, analogous to options or macroactions. Coarse skill dynamics, i.e., the state transition caused by a (complete) skill execution, are learnt and are unrolled forward during lookahead search. Policy search benefits from temporal abstraction during exploration, though itself operates over low-level primitive actions, and thus the resulting policies does not suffer from suboptimality and inflexibility caused by coarse skill chaining. We show that the proposed exploration strategy results in effective learning of complex manipulation policies faster than current state-of-the-art RL methods, and converges to better policies than methods that use options or parametrized skills as building blocks of the policy itself, as opposed to guiding exploration. We show that the proposed exploration strategy results in effective learning of complex manipulation policies faster than current state-of-the-art RL methods, and converges to better policies than methods that use options or parameterized skills as building blocks of the policy itself, as opposed to guiding exploration.Comment: This is a pre-print of our paper which is accepted in AAAI 201

    Cellular Automata Applications in Shortest Path Problem

    Full text link
    Cellular Automata (CAs) are computational models that can capture the essential features of systems in which global behavior emerges from the collective effect of simple components, which interact locally. During the last decades, CAs have been extensively used for mimicking several natural processes and systems to find fine solutions in many complex hard to solve computer science and engineering problems. Among them, the shortest path problem is one of the most pronounced and highly studied problems that scientists have been trying to tackle by using a plethora of methodologies and even unconventional approaches. The proposed solutions are mainly justified by their ability to provide a correct solution in a better time complexity than the renowned Dijkstra's algorithm. Although there is a wide variety regarding the algorithmic complexity of the algorithms suggested, spanning from simplistic graph traversal algorithms to complex nature inspired and bio-mimicking algorithms, in this chapter we focus on the successful application of CAs to shortest path problem as found in various diverse disciplines like computer science, swarm robotics, computer networks, decision science and biomimicking of biological organisms' behaviour. In particular, an introduction on the first CA-based algorithm tackling the shortest path problem is provided in detail. After the short presentation of shortest path algorithms arriving from the relaxization of the CAs principles, the application of the CA-based shortest path definition on the coordinated motion of swarm robotics is also introduced. Moreover, the CA based application of shortest path finding in computer networks is presented in brief. Finally, a CA that models exactly the behavior of a biological organism, namely the Physarum's behavior, finding the minimum-length path between two points in a labyrinth is given.Comment: To appear in the book: Adamatzky, A (Ed.) Shortest path solvers. From software to wetware. Springer, 201

    Sampling-Based Methods for Factored Task and Motion Planning

    Full text link
    This paper presents a general-purpose formulation of a large class of discrete-time planning problems, with hybrid state and control-spaces, as factored transition systems. Factoring allows state transitions to be described as the intersection of several constraints each affecting a subset of the state and control variables. Robotic manipulation problems with many movable objects involve constraints that only affect several variables at a time and therefore exhibit large amounts of factoring. We develop a theoretical framework for solving factored transition systems with sampling-based algorithms. The framework characterizes conditions on the submanifold in which solutions lie, leading to a characterization of robust feasibility that incorporates dimensionality-reducing constraints. It then connects those conditions to corresponding conditional samplers that can be composed to produce values on this submanifold. We present two domain-independent, probabilistically complete planning algorithms that take, as input, a set of conditional samplers. We demonstrate the empirical efficiency of these algorithms on a set of challenging task and motion planning problems involving picking, placing, and pushing

    Combining Subgoal Graphs with Reinforcement Learning to Build a Rational Pathfinder

    Full text link
    In this paper, we present a hierarchical path planning framework called SG-RL (subgoal graphs-reinforcement learning), to plan rational paths for agents maneuvering in continuous and uncertain environments. By "rational", we mean (1) efficient path planning to eliminate first-move lags; (2) collision-free and smooth for agents with kinematic constraints satisfied. SG-RL works in a two-level manner. At the first level, SG-RL uses a geometric path-planning method, i.e., Simple Subgoal Graphs (SSG), to efficiently find optimal abstract paths, also called subgoal sequences. At the second level, SG-RL uses an RL method, i.e., Least-Squares Policy Iteration (LSPI), to learn near-optimal motion-planning policies which can generate kinematically feasible and collision-free trajectories between adjacent subgoals. The first advantage of the proposed method is that SSG can solve the limitations of sparse reward and local minima trap for RL agents; thus, LSPI can be used to generate paths in complex environments. The second advantage is that, when the environment changes slightly (i.e., unexpected obstacles appearing), SG-RL does not need to reconstruct subgoal graphs and replan subgoal sequences using SSG, since LSPI can deal with uncertainties by exploiting its generalization ability to handle changes in environments. Simulation experiments in representative scenarios demonstrate that, compared with existing methods, SG-RL can work well on large-scale maps with relatively low action-switching frequencies and shorter path lengths, and SG-RL can deal with small changes in environments. We further demonstrate that the design of reward functions and the types of training environments are important factors for learning feasible policies.Comment: 20 page
    • …
    corecore