297 research outputs found
Learning Nullspace Policies
Many everyday tasks performed by people, such
as reaching, pointing or drawing, resolve redundant degrees
of freedom in the arm in a similar way. In this paper we
present a novel method for learning the strategy used to
resolve redundancy by exploiting the variability in multiple
observations of different tasks.We demonstrate the effectiveness
of this method on three simulated plants: a toy example, a three
link planar arm, and the KUKA lightweight arm
Reconstructing null-space policies subject to dynamic task constraints in redundant manipulators
We consider the problem of direct policy learning in situations where the policies are only observable through their projections into the null-space of a set of dynamic, non-linear task constraints. We tackle the issue of deriving consistent data for the learning of such policies and make two contributions towards its solution. Firstly, we derive the conditions required to exactly reconstruct null-space policies and suggest a learning strategy based on this derivation. Secondly, we consider the case that the null-space policy is conservative and show that such a policy can be learnt more easily and robustly by learning the underlying potential function and using this as our representation of the policy.
MDP Homomorphic Networks: Group Symmetries in Reinforcement Learning
This paper introduces MDP homomorphic networks for deep reinforcement
learning. MDP homomorphic networks are neural networks that are equivariant
under symmetries in the joint state-action space of an MDP. Current approaches
to deep reinforcement learning do not usually exploit knowledge about such
structure. By building this prior knowledge into policy and value networks
using an equivariance constraint, we can reduce the size of the solution space.
We specifically focus on group-structured symmetries (invertible
transformations). Additionally, we introduce an easy method for constructing
equivariant network layers numerically, so the system designer need not solve
the constraints by hand, as is typically done. We construct MDP homomorphic
MLPs and CNNs that are equivariant under either a group of reflections or
rotations. We show that such networks converge faster than unstructured
baselines on CartPole, a grid world and Pong
Inverse-Dynamics MPC via Nullspace Resolution
Optimal control (OC) using inverse dynamics provides numerical benefits such
as coarse optimization, cheaper computation of derivatives, and a high
convergence rate. However, in order to take advantage of these benefits in
model predictive control (MPC) for legged robots, it is crucial to handle its
large number of equality constraints efficiently. To accomplish this, we first
(i) propose a novel approach to handle equality constraints based on nullspace
parametrization. Our approach balances optimality, and both dynamics and
equality-constraint feasibility appropriately, which increases the basin of
attraction to good local minima. To do so, we then (ii) adapt our
feasibility-driven search by incorporating a merit function. Furthermore, we
introduce (iii) a condensed formulation of the inverse dynamics that considers
arbitrary actuator models. We also develop (iv) a novel MPC based on inverse
dynamics within a perception locomotion framework. Finally, we present (v) a
theoretical comparison of optimal control with the forward and inverse
dynamics, and evaluate both numerically. Our approach enables the first
application of inverse-dynamics MPC on hardware, resulting in state-of-the-art
dynamic climbing on the ANYmal robot. We benchmark it over a wide range of
robotics problems and generate agile and complex maneuvers. We show the
computational reduction of our nullspace resolution and condensed formulation
(up to 47.3%). We provide evidence of the benefits of our approach by solving
coarse optimization problems with a high convergence rate (up to 10 Hz of
discretization). Our algorithm is publicly available inside CROCODDYL.Comment: 17 pages, 14 figures, under-revie
Learning Singularity Avoidance
With the increase in complexity of robotic systems and the rise in non-expert
users, it can be assumed that task constraints are not explicitly known. In
tasks where avoiding singularity is critical to its success, this paper
provides an approach, especially for non-expert users, for the system to learn
the constraints contained in a set of demonstrations, such that they can be
used to optimise an autonomous controller to avoid singularity, without having
to explicitly know the task constraints. The proposed approach avoids
singularity, and thereby unpredictable behaviour when carrying out a task, by
maximising the learnt manipulability throughout the motion of the constrained
system, and is not limited to kinematic systems. Its benefits are demonstrated
through comparisons with other control policies which show that the constrained
manipulability of a system learnt through demonstration can be used to avoid
singularities in cases where these other policies would fail. In the absence of
the systems manipulability subject to a tasks constraints, the proposed
approach can be used instead to infer these with results showing errors less
than 10^-5 in 3DOF simulated systems as well as 10^-2 using a 7DOF real world
robotic system
- …