Search CORE

140,196 research outputs found

Combining Model-Based and Model-Free Updates for Trajectory-Centric Reinforcement Learning

Author: Chebotar Yevgen
Hausman Karol
Levine Sergey
Schaal Stefan
Sukhatme Gaurav
Zhang Marvin
Publication venue
Publication date: 18/06/2017
Field of study

Reinforcement learning (RL) algorithms for real-world robotic applications need a data-efficient learning process and the ability to handle complex, unknown dynamical systems. These requirements are handled well by model-based and model-free RL approaches, respectively. In this work, we aim to combine the advantages of these two types of methods in a principled manner. By focusing on time-varying linear-Gaussian policies, we enable a model-based algorithm based on the linear quadratic regulator (LQR) that can be integrated into the model-free framework of path integral policy improvement (PI2). We can further combine our method with guided policy search (GPS) to train arbitrary parameterized policies such as deep neural networks. Our simulation and real-world experiments demonstrate that this method can solve challenging manipulation tasks with comparable or better performance than model-free methods while maintaining the sample efficiency of model-based methods. A video presenting our results is available at https://sites.google.com/site/icml17pilqrComment: Paper accepted to the International Conference on Machine Learning (ICML) 201

arXiv.org e-Print Archive

MPG.PuRe

Topology-Guided Path Integral Approach for Stochastic Optimal Control in Cluttered Environment

Author: Choi Han-Lim
Ha Jung-Su
Park Soon-Seo
Publication venue
Publication date: 01/08/2018
Field of study

This paper addresses planning and control of robot motion under uncertainty that is formulated as a continuous-time, continuous-space stochastic optimal control problem, by developing a topology-guided path integral control method. The path integral control framework, which forms the backbone of the proposed method, re-writes the Hamilton-Jacobi-Bellman equation as a statistical inference problem; the resulting inference problem is solved by a sampling procedure that computes the distribution of controlled trajectories around the trajectory by the passive dynamics. For motion control of robots in a highly cluttered environment, however, this sampling can easily be trapped in a local minimum unless the sample size is very large, since the global optimality of local minima depends on the degree of uncertainty. Thus, a homology-embedded sampling-based planner that identifies many (potentially) local-minimum trajectories in different homology classes is developed to aid the sampling process. In combination with a receding-horizon fashion of the optimal control the proposed method produces a dynamically feasible and collision-free motion plans without being trapped in a local minimum. Numerical examples on a synthetic toy problem and on quadrotor control in a complex obstacle field demonstrate the validity of the proposed method.Comment: arXiv admin note: text overlap with arXiv:1510.0534

arXiv.org e-Print Archive

Path integral policy improvement with differential dynamic programming

Author: Crevecoeur Guillaume
Lefebvre Tom
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2019
Field of study

Path Integral Policy Improvement with Covariance Matrix Adaptation (PI2-CMA) is a step-based model free reinforcement learning approach that combines statistical estimation techniques with fundamental results from Stochastic Optimal Control. Basically, a policy distribution is improved iteratively using reward weighted averaging of the corresponding rollouts. It was assumed that PI2-CMA somehow exploited gradient information that was contained by the reward weighted statistics. To our knowledge we are the first to expose the principle of this gradient extraction rigorously. Our findings reveal that PI2-CMA essentially obtains gradient information similar to the forward and backward passes in the Differential Dynamic Programming (DDP) method. It is then straightforward to extend the analogy with DDP by introducing a feedback term in the policy update. This suggests a novel algorithm which we coin Path Integral Policy Improvement with Differential Dynamic Programming (PI2-DDP). The resulting algorithm is similar to the previously proposed Sampled Differential Dynamic Programming (SaDDP) but we derive the method independently as a generalization of the framework of PI2-CMA. Our derivations suggest to implement some small variations to SaDDP so to increase performance. We validated our claims on a robot trajectory learning task

Crossref

Ghent University Academic Bibliography

Prescribed Performance Control Guided Policy Improvement for Satisfying Signal Temporal Logic Tasks

Author: Dimarogonas Dimos V.
Varnai Peter
Publication venue
Publication date: 01/01/2019
Field of study

Signal temporal logic (STL) provides a user-friendly interface for defining complex tasks for robotic systems. Recent efforts aim at designing control laws or using reinforcement learning methods to find policies which guarantee satisfaction of these tasks. While the former suffer from the trade-off between task specification and computational complexity, the latter encounter difficulties in exploration as the tasks become more complex and challenging to satisfy. This paper proposes to combine the benefits of the two approaches and use an efficient prescribed performance control (PPC) base law to guide exploration within the reinforcement learning algorithm. The potential of the method is demonstrated in a simulated environment through two sample navigational tasks.Comment: This is the extended version of the paper accepted to the 2019 American Control Conference (ACC), Philadelphia (to be published

arXiv.org e-Print Archive

Publikationer från KTH

Crossref

Digitala Vetenskapliga Arkivet - Academic Archive On-line

Collective Robot Reinforcement Learning with Distributed Asynchronous Guided Policy Search

Author: Chebotar Yevgen
Kalakrishnan Mrinal
Levine Sergey
Li Adrian
Yahya Ali
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 03/10/2016
Field of study

In principle, reinforcement learning and policy search methods can enable robots to learn highly complex and general skills that may allow them to function amid the complexity and diversity of the real world. However, training a policy that generalizes well across a wide range of real-world conditions requires far greater quantity and diversity of experience than is practical to collect with a single robot. Fortunately, it is possible for multiple robots to share their experience with one another, and thereby, learn a policy collectively. In this work, we explore distributed and asynchronous policy learning as a means to achieve generalization and improved training times on challenging, real-world manipulation tasks. We propose a distributed and asynchronous version of Guided Policy Search and use it to demonstrate collective policy learning on a vision-based door opening task using four robots. We show that it achieves better generalization, utilization, and training times than the single robot alternative.Comment: Submitted to the IEEE International Conference on Robotics and Automation 201

arXiv.org e-Print Archive

Crossref

Perception is Everything: Repairing the Image of American Drone Warfare

Author: Tubbs Luke
Publication venue: Scholars Crossing
Publication date: 01/05/2018
Field of study

This thesis will trace the United States’ development of unmanned warfare from its initial use in the World Wars through the Cold War to its final maturation in the War on Terror. The examination will provide a summary of unmanned warfare’s history, its gradual adoption, and concerns regarding the proliferation of drones use to understand the emphasis on unmanned weapons in the American Military. In each phase of development, a single program will be focused on to highlight special areas of interest in the modern day. Finally, the modern era of unmanned systems will focus on the growing integration of new weapon systems which no longer fulfill niche roles in the armory but act as fully vetted frontline combatants. Brought together, this examination will show drones have earned their place as integral tools in the American military inventory as faithful defenders of democracy

Liberty University Digital Commons