3,622 research outputs found

    Reinforcement Learning and Planning for Preference Balancing Tasks

    Get PDF
    Robots are often highly non-linear dynamical systems with many degrees of freedom, making solving motion problems computationally challenging. One solution has been reinforcement learning (RL), which learns through experimentation to automatically perform the near-optimal motions that complete a task. However, high-dimensional problems and task formulation often prove challenging for RL. We address these problems with PrEference Appraisal Reinforcement Learning (PEARL), which solves Preference Balancing Tasks (PBTs). PBTs define a problem as a set of preferences that the system must balance to achieve a goal. The method is appropriate for acceleration-controlled systems with continuous state-space and either discrete or continuous action spaces with unknown system dynamics. We show that PEARL learns a sub-optimal policy on a subset of states and actions, and transfers the policy to the expanded domain to produce a more refined plan on a class of robotic problems. We establish convergence to task goal conditions, and even when preconditions are not verifiable, show that this is a valuable method to use before other more expensive approaches. Evaluation is done on several robotic problems, such as Aerial Cargo Delivery, Multi-Agent Pursuit, Rendezvous, and Inverted Flying Pendulum both in simulation and experimentally. Additionally, PEARL is leveraged outside of robotics as an array sorting agent. The results demonstrate high accuracy and fast learning times on a large set of practical applications

    PRM-RL: Long-range Robotic Navigation Tasks by Combining Reinforcement Learning and Sampling-based Planning

    Full text link
    We present PRM-RL, a hierarchical method for long-range navigation task completion that combines sampling based path planning with reinforcement learning (RL). The RL agents learn short-range, point-to-point navigation policies that capture robot dynamics and task constraints without knowledge of the large-scale topology. Next, the sampling-based planners provide roadmaps which connect robot configurations that can be successfully navigated by the RL agent. The same RL agents are used to control the robot under the direction of the planning, enabling long-range navigation. We use the Probabilistic Roadmaps (PRMs) for the sampling-based planner. The RL agents are constructed using feature-based and deep neural net policies in continuous state and action spaces. We evaluate PRM-RL, both in simulation and on-robot, on two navigation tasks with non-trivial robot dynamics: end-to-end differential drive indoor navigation in office environments, and aerial cargo delivery in urban environments with load displacement constraints. Our results show improvement in task completion over both RL agents on their own and traditional sampling-based planners. In the indoor navigation task, PRM-RL successfully completes up to 215 m long trajectories under noisy sensor conditions, and the aerial cargo delivery completes flights over 1000 m without violating the task constraints in an environment 63 million times larger than used in training.Comment: 9 pages, 7 figure

    Robust online motion planning with reachable sets

    Get PDF
    Thesis (S.M.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2013.Cataloged from PDF version of thesis.Includes bibliographical references (p. 51-55).In this thesis we consider the problem of generating motion plans for a nonlinear dynamical system that are guaranteed to succeed despite uncertainty in the environment, parametric model uncertainty, disturbances, and/or errors in state estimation. Furthermore, we consider the case where these plans must be generated online, because constraints such as obstacles in the environment may not be known until they are perceived (with a noisy sensor) at runtime. Previous work on feedback motion planning for nonlinear systems was limited to offline planning due to the computational cost of safety verification. Here we augment the traditional trajectory library approach by designing locally stabilizing controllers for each nominal trajectory in the library and providing guarantees on the resulting closed loop systems. We leverage sums-of-squares programming to design these locally stabilizing controllers by explicitly attempting to minimize the size of the worst case reachable set of the closed-loop system subjected to bounded disturbances and uncertainty. The reachable sets associated with each trajectory in the library can be thought of as "funnels" that the system is guaranteed to remain within. The resulting funnel library is then used to sequentially compose motion plans at runtime while ensuring the safety of the robot. A major advantage of the work presented here is that by explicitly taking into account the effect of uncertainty, the robot can evaluate motion plans based on how vulnerable they are to disturbances. We demonstrate our method on a simulation of a plane flying through a two dimensional forest of polygonal trees with parametric uncertainty and disturbances in the form of a bounded "cross-wind". We further validate our approach by carefully evaluating the guarantees on invariance provided by funnels on two challenging underactuated systems (the "Acrobot" and a small-sized airplane).by Anirudha Majumdar.S.M

    Motor Learning Characterized by Changing Lévy Distributions

    Get PDF
    The probability distributions for changes in transverse plane fingertip speed are Lévy distributed in human pole balancing. Six subjects learned to balance a pole on their index finger over three sessions while sitting and standing. The Lévy or decay exponent decreased as a function of learning, showing reduced decay in the probability for large speed steps and was significantly smaller in the sitting condition. However, the probability distribution for changes in fingertip speed was truncated so that the probability for large steps was reduced in this condition. These results show a learning-induced tolerance for large speed step sizes and demonstrate that motor learning in continuous tasks may be characterized by changing distributions that reflect sensorimotor skill acquisition

    Graceful Navigation for Mobile Robots in Dynamic and Uncertain Environments.

    Full text link
    The ability to navigate in everyday environments is a fundamental and necessary skill for any autonomous mobile agent that is intended to work with human users. The presence of pedestrians and other dynamic objects, however, makes the environment inherently dynamic and uncertain. To navigate in such environments, an agent must reason about the near future and make an optimal decision at each time step so that it can move safely toward the goal. Furthermore, for any application intended to carry passengers, it also must be able to move smoothly and comfortably, and the robot behavior needs to be customizable to match the preference of the individual users. Despite decades of progress in the field of motion planning and control, this remains a difficult challenge with existing methods. In this dissertation, we show that safe, comfortable, and customizable mobile robot navigation in dynamic and uncertain environments can be achieved via stochastic model predictive control. We view the problem of navigation in dynamic and uncertain environments as a continuous decision making process, where an agent with short-term predictive capability reasons about its situation and makes an informed decision at each time step. The problem of robot navigation in dynamic and uncertain environments is formulated as an on-line, finite-horizon policy and trajectory optimization problem under uncertainty. With our formulation, planning and control becomes fully integrated, which allows direct optimization of the performance measure. Furthermore, with our approach the problem becomes easy to solve, which allows our algorithm to run in real time on a single core of a typical laptop with off-the-shelf optimization packages. The work presented in this thesis extends the state-of-the-art in analytic control of mobile robots, sampling-based optimal path planning, and stochastic model predictive control. We believe that our work is a significant step toward safe and reliable autonomous navigation that is acceptable to human users.PhDMechanical EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/120760/1/jongjinp_1.pd
    • …
    corecore