140,196 research outputs found
Combining Model-Based and Model-Free Updates for Trajectory-Centric Reinforcement Learning
Reinforcement learning (RL) algorithms for real-world robotic applications
need a data-efficient learning process and the ability to handle complex,
unknown dynamical systems. These requirements are handled well by model-based
and model-free RL approaches, respectively. In this work, we aim to combine the
advantages of these two types of methods in a principled manner. By focusing on
time-varying linear-Gaussian policies, we enable a model-based algorithm based
on the linear quadratic regulator (LQR) that can be integrated into the
model-free framework of path integral policy improvement (PI2). We can further
combine our method with guided policy search (GPS) to train arbitrary
parameterized policies such as deep neural networks. Our simulation and
real-world experiments demonstrate that this method can solve challenging
manipulation tasks with comparable or better performance than model-free
methods while maintaining the sample efficiency of model-based methods. A video
presenting our results is available at
https://sites.google.com/site/icml17pilqrComment: Paper accepted to the International Conference on Machine Learning
(ICML) 201
Topology-Guided Path Integral Approach for Stochastic Optimal Control in Cluttered Environment
This paper addresses planning and control of robot motion under uncertainty
that is formulated as a continuous-time, continuous-space stochastic optimal
control problem, by developing a topology-guided path integral control method.
The path integral control framework, which forms the backbone of the proposed
method, re-writes the Hamilton-Jacobi-Bellman equation as a statistical
inference problem; the resulting inference problem is solved by a sampling
procedure that computes the distribution of controlled trajectories around the
trajectory by the passive dynamics. For motion control of robots in a highly
cluttered environment, however, this sampling can easily be trapped in a local
minimum unless the sample size is very large, since the global optimality of
local minima depends on the degree of uncertainty. Thus, a homology-embedded
sampling-based planner that identifies many (potentially) local-minimum
trajectories in different homology classes is developed to aid the sampling
process. In combination with a receding-horizon fashion of the optimal control
the proposed method produces a dynamically feasible and collision-free motion
plans without being trapped in a local minimum. Numerical examples on a
synthetic toy problem and on quadrotor control in a complex obstacle field
demonstrate the validity of the proposed method.Comment: arXiv admin note: text overlap with arXiv:1510.0534
Path integral policy improvement with differential dynamic programming
Path Integral Policy Improvement with Covariance Matrix Adaptation (PI2-CMA) is a step-based model free reinforcement learning approach that combines statistical estimation techniques with fundamental results from Stochastic Optimal Control. Basically, a policy distribution is improved iteratively using reward weighted averaging of the corresponding rollouts. It was assumed that PI2-CMA somehow exploited gradient information that was contained by the reward weighted statistics. To our knowledge we are the first to expose the principle of this gradient extraction rigorously. Our findings reveal that PI2-CMA essentially obtains gradient information similar to the forward and backward passes in the Differential Dynamic Programming (DDP) method. It is then straightforward to extend the analogy with DDP by introducing a feedback term in the policy update. This suggests a novel algorithm which we coin Path Integral Policy Improvement with Differential Dynamic Programming (PI2-DDP). The resulting algorithm is similar to the previously proposed Sampled Differential Dynamic Programming (SaDDP) but we derive the method independently as a generalization of the framework of PI2-CMA. Our derivations suggest to implement some small variations to SaDDP so to increase performance. We validated our claims on a robot trajectory learning task
Prescribed Performance Control Guided Policy Improvement for Satisfying Signal Temporal Logic Tasks
Signal temporal logic (STL) provides a user-friendly interface for defining
complex tasks for robotic systems. Recent efforts aim at designing control laws
or using reinforcement learning methods to find policies which guarantee
satisfaction of these tasks. While the former suffer from the trade-off between
task specification and computational complexity, the latter encounter
difficulties in exploration as the tasks become more complex and challenging to
satisfy. This paper proposes to combine the benefits of the two approaches and
use an efficient prescribed performance control (PPC) base law to guide
exploration within the reinforcement learning algorithm. The potential of the
method is demonstrated in a simulated environment through two sample
navigational tasks.Comment: This is the extended version of the paper accepted to the 2019
American Control Conference (ACC), Philadelphia (to be published
Collective Robot Reinforcement Learning with Distributed Asynchronous Guided Policy Search
In principle, reinforcement learning and policy search methods can enable
robots to learn highly complex and general skills that may allow them to
function amid the complexity and diversity of the real world. However, training
a policy that generalizes well across a wide range of real-world conditions
requires far greater quantity and diversity of experience than is practical to
collect with a single robot. Fortunately, it is possible for multiple robots to
share their experience with one another, and thereby, learn a policy
collectively. In this work, we explore distributed and asynchronous policy
learning as a means to achieve generalization and improved training times on
challenging, real-world manipulation tasks. We propose a distributed and
asynchronous version of Guided Policy Search and use it to demonstrate
collective policy learning on a vision-based door opening task using four
robots. We show that it achieves better generalization, utilization, and
training times than the single robot alternative.Comment: Submitted to the IEEE International Conference on Robotics and
Automation 201
Perception is Everything: Repairing the Image of American Drone Warfare
This thesis will trace the United States’ development of unmanned warfare from its initial use in the World Wars through the Cold War to its final maturation in the War on Terror. The examination will provide a summary of unmanned warfare’s history, its gradual adoption, and concerns regarding the proliferation of drones use to understand the emphasis on unmanned weapons in the American Military. In each phase of development, a single program will be focused on to highlight special areas of interest in the modern day. Finally, the modern era of unmanned systems will focus on the growing integration of new weapon systems which no longer fulfill niche roles in the armory but act as fully vetted frontline combatants. Brought together, this examination will show drones have earned their place as integral tools in the American military inventory as faithful defenders of democracy
- …