31 research outputs found

    Data-Efficient Generalization of Robot Skills with Contextual Policy Search

    Get PDF
    In robotics, controllers make the robot solve a task within a specific context. The context can describe the objectives of the robot or physical properties of the environment and is always specified before task execution. To generalize the controller to multiple contexts, we follow a hierarchical approach for policy learning: A lower-level policy controls the robot for a given context and an upper-level policy generalizes among contexts. Current approaches for learning such upper-level policies are based on model-free policy search, which require an excessive number of interactions of the robot with its environment. More data-efficient policy search approaches are model based but, thus far, without the capability of learning hierarchical policies. We propose a new model-based policy search approach that can also learn contextual upper-level policies. Our approach is based on learning probabilistic forward models for long-term predictions. Using these predictions, we use information-theoretic insights to improve the upper-level policy. Our method achieves a substantial improvement in learning speed compared to existing methods on simulated and real robotic tasks. Copyright © 2013, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved

    Learning to Race through Coordinate Descent Bayesian Optimisation

    Full text link
    In the automation of many kinds of processes, the observable outcome can often be described as the combined effect of an entire sequence of actions, or controls, applied throughout its execution. In these cases, strategies to optimise control policies for individual stages of the process might not be applicable, and instead the whole policy might have to be optimised at once. On the other hand, the cost to evaluate the policy's performance might also be high, being desirable that a solution can be found with as few interactions as possible with the real system. We consider the problem of optimising control policies to allow a robot to complete a given race track within a minimum amount of time. We assume that the robot has no prior information about the track or its own dynamical model, just an initial valid driving example. Localisation is only applied to monitor the robot and to provide an indication of its position along the track's centre axis. We propose a method for finding a policy that minimises the time per lap while keeping the vehicle on the track using a Bayesian optimisation (BO) approach over a reproducing kernel Hilbert space. We apply an algorithm to search more efficiently over high-dimensional policy-parameter spaces with BO, by iterating over each dimension individually, in a sequential coordinate descent-like scheme. Experiments demonstrate the performance of the algorithm against other methods in a simulated car racing environment.Comment: Accepted as conference paper for the 2018 IEEE International Conference on Robotics and Automation (ICRA

    Demonstration-free contextualized probabilistic movement primitives, further enhanced with obstacle avoidance

    Get PDF
    © 20xx IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.Movement Primitives (MPs) have been widely used over the last years for learning robot motion tasks with direct Policy Search (PS) reinforcement learning. Among them, Probabilistic Movement Primitives (ProMPs) are a kind of MP based on a stochastic representation over sets of trajectories, which benefits from the properties of probability operations. However, the generation of such ProMPs requires a set of demonstrations to capture motion variability. Additionally, using context variables to modify trajectories coded as MPs is a popular approach nowadays in order to adapt motion to environmental variables. This paper proposes a contextual representation of ProMPs that allows for an easy adaptation to changing situations through context variables, by reparametrizing motion with them. Moreover, we propose a way of initializing contextual trajectories without the need of real robot demonstrations, by setting an initial position, a final position, and a number of trajectory interest points, where the contextual variables are evaluated. The parametrizations obtained show to be accurate while relieving the user from the need of performing costly computations such as conditioning. Additionally, using this contextual representation, we propose a simple yet effective quadratic optimization-based obstacle avoidance method for ProMPs. Experiments in simulation and on a real robot show the promise of the approach.Peer ReviewedPostprint (author's final draft

    Dimensionality reduction in learning Gaussian mixture models of movement primitives for contextualized action selection and adaptation

    Get PDF
    © 20xx IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.Robotic manipulation often requires adaptation to changing environments. Such changes can be represented by a certain number of contextual variables that may be observed or sensed in different manners. When learning and representing robot motion –usually with movement primitives–, it is desirable to adapt the learned behaviors to the current context. Moreover, different actions or motions can be considered in the same framework, using contextualization to decide which action applies to which situation. Such frameworks, however, may easily become large-dimensional, thus requiring to reduce the dimensionality of the parameters space, as well as the amount of data needed to generate and improve the model over experience. In this paper, we propose an approach to obtain a generative model from a set of actions that share a common feature. Such feature, namely a contextual variable, is plugged into the model to generate motion. We encode the data with a Gaussian Mixture Model in the parameter space of Probabilistic Movement Primitives (ProMPs), after performing Dimensionality Reduction (DR) on such parameter space. We append the contextual variable to the parameter space and obtain the number of Gaussian components, i.e., different actions in a dataset, through Persistent Homology. Then, using multimodal Gaussian Mixture Regression (GMR), we can retrieve the most likely actions given a contextual situation and execute them. After actions are executed, we use Reward-Weighted Responsibility GMM (RWR-GMM) update the model after each execution. Experimentation in 3 scenarios shows that the method drastically reduces the dimensionality of the parameter space, thus implementing both action selection and adaptation to a changing situation in an efficient way.Peer ReviewedPostprint (author's final draft
    corecore