Search CORE

11 research outputs found

Asymmetric Actor Critic for Image-Based Robot Learning

Author: Abbeel Pieter
Andrychowicz Marcin
Pinto Lerrel
Welinder Peter
Zaremba Wojciech
Publication venue
Publication date: 17/10/2017
Field of study

Deep reinforcement learning (RL) has proven a powerful technique in many sequential decision making domains. However, Robotics poses many challenges for RL, most notably training on a physical system can be expensive and dangerous, which has sparked significant interest in learning control policies using a physics simulator. While several recent works have shown promising results in transferring policies trained in simulation to the real world, they often do not fully utilize the advantage of working with a simulator. In this work, we exploit the full state observability in the simulator to train better policies which take as input only partial observations (RGBD images). We do this by employing an actor-critic training algorithm in which the critic is trained on full states while the actor (or policy) gets rendered images as input. We show experimentally on a range of simulated tasks that using these asymmetric inputs significantly improves performance. Finally, we combine this method with domain randomization and show real robot experiments for several tasks like picking, pushing, and moving a block. We achieve this simulation to real world transfer without training on any real world data.Comment: Videos of experiments can be found at http://www.goo.gl/b57WT

arXiv.org e-Print Archive

Crossref

Toward Robust Long Range Policy Transfer

Author: Feng Yao-Min
Lin Jin-Siang
Sun Min
Tseng Wei-Cheng
Publication venue
Publication date: 04/03/2021
Field of study

Humans can master a new task within a few trials by drawing upon skills acquired through prior experience. To mimic this capability, hierarchical models combining primitive policies learned from prior tasks have been proposed. However, these methods fall short comparing to the human's range of transferability. We propose a method, which leverages the hierarchical structure to train the combination function and adapt the set of diverse primitive polices alternatively, to efficiently produce a range of complex behaviors on challenging new tasks. We also design two regularization terms to improve the diversity and utilization rate of the primitives in the pre-training phase. We demonstrate that our method outperforms other recent policy transfer methods by combining and adapting these reusable primitives in tasks with continuous action space. The experiment results further show that our approach provides a broader transferring range. The ablation study also shows the regularization terms are critical for long range policy transfer. Finally, we show that our method consistently outperforms other methods when the quality of the primitives varies.Comment: Accepted by AAAI 202

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Optimization And Learning For Rough Terrain Legged Locomotion

Author: Atkeson C. G.
Bagnell J. A.
Chestnutt J.
Kuffner J.
Ratliff N.
Stolle M.
Zucker Matthew A.
Publication venue: 'Transformative Works and Cultures'
Publication date: 01/01/2011
Field of study

We present a novel approach to legged locomotion over rough terrain that is thoroughly rooted in optimization. This approach relies on a hierarchy of fast, anytime algorithms to plan a set of footholds, along with the dynamic body motions required to execute them. Components within the planning framework coordinate to exchange plans, cost-to-go estimates, and \u27certificates\u27 that ensure the output of an abstract high-level planner can be realized by lower layers of the hierarchy. The burden of careful engineering of cost functions to achieve desired performance is substantially mitigated by a simple inverse optimal control technique. Robustness is achieved by real-time re-planning of the full trajectory, augmented by reflexes and feedback control. We demonstrate the successful application of our approach in guiding the LittleDog quadruped robot over a variety of types of rough terrain. Other novel aspects of our past research efforts include a variety of pioneering inverse optimal control techniques as well as a system for planning using arbitrary pre-recorded robot behavior

Works

An optimization approach to rough terrain locomotion

Author
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

Crossref

Latent Space Reinforcement Learning

Author: Luck Kevin Sebastian
Publication venue
Publication date: 06/05/2014
Field of study

Often we have to handle high dimensional spaces if we want to learn motor skills for robots. In policy search tasks we have to find several parameters to learn a desired movement. This high dimensionality in parameters can be challenging for reinforcement algorithms, since more samples for finding an optimal solution are needed with every additional dimension. On the other hand, if the robot has a high number of actuators, an inherent correlation between these can be found for a specific motor task, which we can exploit for a faster convergence. One possibility is to use techniques to reduce the dimensionality of the space, which is used as a pre-processing step or as an independent process in most applications. In this thesis we present a novel algorithm which combines the theory of policy search and probabilistic dimensionality reduction to uncover the hidden structure of high dimensional action spaces. Evaluations on an inverse kinematics task indicate that the presented algorithm is able to outperform the reference algorithms PoWER and CMA-ES, especially in high dimensional spaces. Furthermore we evaluate our algorithm on a real-world task. In this task, a NAO robot learns to lift his leg while keeping balance. The issue of collecting samples for learning on a real robot in such a task, which is often very time and cost consuming, is considered in here by using a small number of samples in each iteration

TUbiblio

tuprints

Sample-Efficient Reinforcement Learning of Robot Control Policies in the Real World

Author
Publication venue
Publication date: 01/01/2019
Field of study

abstract: The goal of reinforcement learning is to enable systems to autonomously solve tasks in the real world, even in the absence of prior data. To succeed in such situations, reinforcement learning algorithms collect new experience through interactions with the environment to further the learning process. The behaviour is optimized by maximizing a reward function, which assigns high numerical values to desired behaviours. Especially in robotics, such interactions with the environment are expensive in terms of the required execution time, human involvement, and mechanical degradation of the system itself. Therefore, this thesis aims to introduce sample-efficient reinforcement learning methods which are applicable to real-world settings and control tasks such as bimanual manipulation and locomotion. Sample efficiency is achieved through directed exploration, either by using dimensionality reduction or trajectory optimization methods. Finally, it is demonstrated how data-efficient reinforcement learning methods can be used to optimize the behaviour and morphology of robots at the same time.Dissertation/ThesisDoctoral Dissertation Computer Science 201

ASU Digital Repository

Learning Omnidirectional Path Following Using Dimensionality Reduction

Author
Publication venue
Publication date
Field of study

Abstract — We consider the task of omnidirectional path following for a quadruped robot: moving a four-legged robot along any arbitrary path while turning in any arbitrary manner. Learning a controller capable of such motion requires learning the parameters of a very high-dimensional policy, a difficult task on a real robot. Although learning such a policy can be much easier in a model (or “simulator”) of the system, it can be extremely difficult to build a sufficiently accurate simulator. In this paper we propose a method that uses a (possibly inaccurate) simulator to identify a low-dimensional subspace of policies that spans the variations in model dynamics. This subspace will be robust to variations in the model, and can be learned on the real system using much less data than would be required to learn a policy in the original class. In our approach, we sample several models from a distribution over the kinematic and dynamics parameters of the simulator, then formulate an optimization problem that can be solved using the Reduced Rank Regression (RRR) algorithm to construct a low-dimensional class of policies that spans the major axes of variation in the space of controllers. We present a successful application of this technique to the task of omnidirectional path following, and demonstrate improvement over a number of alternative methods, including a hand-tuned controller. We present, to the best of our knowledge, the first controller capable of omnidirectional path following with parameters optimized simultaneously for all directions of motion and turning rates. I

CiteSeerX