7 research outputs found

    Synergy-based policy improvement with path integrals for anthropomorphic hands

    Get PDF
    In this work, a synergy-based reinforcement learning algorithm has been developed to confer autonomous grasping capabilities to anthropomorphic hands. In the presence of high degrees of freedom, classical machine learning techniques require a number of iterations that increases with the size of the problem, thus convergence of the solution is not ensured. The use of postural synergies determines dimensionality reduction of the search space and allows recent learning techniques, such as Policy Improvement with Path Integrals, to become easily applicable. A key point is the adoption of a suitable reward function representing the goal of the task and ensuring onestep performance evaluation. Force-closure quality of the grasp in the synergies subspace has been chosen as a cost function for performance evaluation. The experiments conducted on the SCHUNK 5-Finger Hand demonstrate the effectiveness of the algorithm showing skills comparable to human capabilities in learning new grasps and in performing a wide variety from power to high precision grasps of very small objects

    Proximodistal Exploration in Motor Learning as an Emergent Property of Optimization

    Get PDF
    International audienceTo harness the complexity of their high-dimensional bodies during sensorimotor development , infants are guided by patterns of freezing and freeing of degrees of freedom. For instance, when learning to reach, infants free the degrees of freedom in their arm proximodis-tally, i.e. from joints that are closer to the body to those that are more distant. Here, we formulate and study computationally the hypothesis that such patterns can emerge spontaneously as the result of a family of stochastic optimization processes (evolution strategies with covariance-matrix adaptation), without an innate encoding of a maturational schedule. In particular, we present simulated experiments with an arm where a computational learner progressively acquires reaching skills through adaptive exploration, and we show that a proximodistal organization appears spontaneously, which we denote PDFF (ProximoDistal Freezing and Freeing of degrees of freedom). We also compare this emergent organization between different arm morphologies – from human-like to quite unnatural ones – to study the effect of different kinematic structures on the emergence of PDFF. Research highlights. • We propose a general, domain-independent hypothesis for the developmental organization of freezing and freeing of degrees of freedom observed both in infant development and adult skill acquisition, such as proximo-distal exploration in learning to reach

    Reinforcement learning to improve 4-finger-gripper manipulation

    Get PDF
    In the framework of robotics, Reinforcement Learning (RL) deals with the learning of a task by the robot itself. This work focuses on a recently developed method, Policy Improvement with Path Integrals (PI2), for the case of a 4-finger-gripper manipulator to perform the task of rotating a ball around a desired axis. The scope of the thesis is to design an experiment, in which the algorithm receives feedback of robot performance. The algorithm has also been adapted to cope with periodic movements parametrized as motor primitives. Furthermore, due to the high dimensionality of the problem, certain assumptions have been made in order to limit the state-space to a reliable subset of it. The obtained results illustrate the good performance of the algorithm as the robot is able to perform the task focusing on important aspects previously set by the user, both for simulation and also for the real robot. The main bottleneck of the thesis has been the speed of both software and hardware, as much time was required to perform long run experiments, specifically in the implementation on the robot where manual supervision was needed

    Adaptive exploration for continual reinforcement learning

    No full text
    Abstract — Most experiments on policy search for robotics focus on isolated tasks, where the experiment is split into two distinct phases: 1) the learning phase, where the robot learns the task through exploration; 2) the exploitation phase, where exploration is turned off, and the robot demonstrates its performance on the task it has learned. In this paper, we present an algorithm that enables robots to continually and autonomously alternate between these phases. We do so by combining the ‘Policy Improvement with Path Integrals ’ direct reinforcement learning algorithm with the covariance matrix adaptation rule from the ‘Cross-Entropy Method ’ optimization algorithm. This integration is possible because both algorithms iteratively update parameters with probability-weighted averaging. A practical advantage of the novel algorithm, called PI 2-CMA, is that it alleviates the user from having to manually tune the degree of exploration. We evaluate PI 2-CMA’s ability to continually and autonomously tune exploration on two tasks. I
    corecore