7 research outputs found
Synergy-based policy improvement with path integrals for anthropomorphic hands
In this work, a synergy-based reinforcement learning
algorithm has been developed to confer autonomous grasping
capabilities to anthropomorphic hands. In the presence of
high degrees of freedom, classical machine learning techniques
require a number of iterations that increases with the size of the
problem, thus convergence of the solution is not ensured. The
use of postural synergies determines dimensionality reduction
of the search space and allows recent learning techniques, such
as Policy Improvement with Path Integrals, to become easily
applicable. A key point is the adoption of a suitable reward
function representing the goal of the task and ensuring onestep
performance evaluation. Force-closure quality of the grasp
in the synergies subspace has been chosen as a cost function
for performance evaluation. The experiments conducted on the
SCHUNK 5-Finger Hand demonstrate the effectiveness of the
algorithm showing skills comparable to human capabilities in
learning new grasps and in performing a wide variety from
power to high precision grasps of very small objects
Proximodistal Exploration in Motor Learning as an Emergent Property of Optimization
International audienceTo harness the complexity of their high-dimensional bodies during sensorimotor development , infants are guided by patterns of freezing and freeing of degrees of freedom. For instance, when learning to reach, infants free the degrees of freedom in their arm proximodis-tally, i.e. from joints that are closer to the body to those that are more distant. Here, we formulate and study computationally the hypothesis that such patterns can emerge spontaneously as the result of a family of stochastic optimization processes (evolution strategies with covariance-matrix adaptation), without an innate encoding of a maturational schedule. In particular, we present simulated experiments with an arm where a computational learner progressively acquires reaching skills through adaptive exploration, and we show that a proximodistal organization appears spontaneously, which we denote PDFF (ProximoDistal Freezing and Freeing of degrees of freedom). We also compare this emergent organization between different arm morphologies – from human-like to quite unnatural ones – to study the effect of different kinematic structures on the emergence of PDFF. Research highlights. • We propose a general, domain-independent hypothesis for the developmental organization of freezing and freeing of degrees of freedom observed both in infant development and adult skill acquisition, such as proximo-distal exploration in learning to reach
Controllo bioispirato di una mano robotica per compiti di manipolazione ciclica
No abstract availabl
Reinforcement learning to improve 4-finger-gripper manipulation
In the framework of robotics, Reinforcement Learning (RL) deals with the learning
of a task by the robot itself. This work focuses on a recently developed method,
Policy Improvement with Path Integrals (PI2), for the case of a 4-finger-gripper
manipulator to perform the task of rotating a ball around a desired axis.
The scope of the thesis is to design an experiment, in which the algorithm receives
feedback of robot performance. The algorithm has also been adapted to cope
with periodic movements parametrized as motor primitives. Furthermore, due to the
high dimensionality of the problem, certain assumptions have been made in order
to limit the state-space to a reliable subset of it. The obtained results illustrate the
good performance of the algorithm as the robot is able to perform the task focusing
on important aspects previously set by the user, both for simulation and also for the
real robot. The main bottleneck of the thesis has been the speed of both software and
hardware, as much time was required to perform long run experiments, specifically
in the implementation on the robot where manual supervision was needed
Adaptive exploration for continual reinforcement learning
Abstract — Most experiments on policy search for robotics focus on isolated tasks, where the experiment is split into two distinct phases: 1) the learning phase, where the robot learns the task through exploration; 2) the exploitation phase, where exploration is turned off, and the robot demonstrates its performance on the task it has learned. In this paper, we present an algorithm that enables robots to continually and autonomously alternate between these phases. We do so by combining the ‘Policy Improvement with Path Integrals ’ direct reinforcement learning algorithm with the covariance matrix adaptation rule from the ‘Cross-Entropy Method ’ optimization algorithm. This integration is possible because both algorithms iteratively update parameters with probability-weighted averaging. A practical advantage of the novel algorithm, called PI 2-CMA, is that it alleviates the user from having to manually tune the degree of exploration. We evaluate PI 2-CMA’s ability to continually and autonomously tune exploration on two tasks. I