Search CORE

297 research outputs found

Learning Nullspace Policies

Author: Howard Matthew
Towell Christopher
Vijayakumar Sethu
Publication venue
Publication date: 01/01/2010
Field of study

Many everyday tasks performed by people, such as reaching, pointing or drawing, resolve redundant degrees of freedom in the arm in a similar way. In this paper we present a novel method for learning the strategy used to resolve redundancy by exploiting the variability in multiple observations of different tasks.We demonstrate the effectiveness of this method on three simulated plants: a toy example, a three link planar arm, and the KUKA lightweight arm

CiteSeerX

Crossref

Edinburgh Research Archive

King's Research Portal

Reconstructing null-space policies subject to dynamic task constraints in redundant manipulators

Author: Howard M.
Vijayakumar S.
Publication venue
Publication date: 01/01/2007
Field of study

We consider the problem of direct policy learning in situations where the policies are only observable through their projections into the null-space of a set of dynamic, non-linear task constraints. We tackle the issue of deriving consistent data for the learning of such policies and make two contributions towards its solution. Firstly, we derive the conditions required to exactly reconstruct null-space policies and suggest a learning strategy based on this derivation. Secondly, we consider the case that the null-space policy is conservative and show that such a policy can be learnt more easily and robustly by learning the underlying potential function and using this as our representation of the policy.

CiteSeerX

Edinburgh Research Explorer

King's Research Portal

MDP Homomorphic Networks: Group Symmetries in Reinforcement Learning

Author: Oliehoek Frans A.
van der Pol Elise
van Hoof Herke
Welling Max
Worrall Daniel E.
Publication venue
Publication date: 01/01/2021
Field of study

This paper introduces MDP homomorphic networks for deep reinforcement learning. MDP homomorphic networks are neural networks that are equivariant under symmetries in the joint state-action space of an MDP. Current approaches to deep reinforcement learning do not usually exploit knowledge about such structure. By building this prior knowledge into policy and value networks using an equivariance constraint, we can reduce the size of the solution space. We specifically focus on group-structured symmetries (invertible transformations). Additionally, we introduce an easy method for constructing equivariant network layers numerically, so the system designer need not solve the constraints by hand, as is typically done. We construct MDP homomorphic MLPs and CNNs that are equivariant under either a group of reflections or rotations. We show that such networks converge faster than unstructured baselines on CartPole, a grid world and Pong

arXiv.org e-Print Archive

International Migration, Integration and Social Cohesion online publications

UvA-DARE

Inverse-Dynamics MPC via Nullspace Resolution

Author: Chhatoi Saroj Prasad
Corbères Thomas
Mastalli Carlos
Tonneau Steve
Vijayakumar Sethu
Publication venue
Publication date: 12/09/2022
Field of study

Optimal control (OC) using inverse dynamics provides numerical benefits such as coarse optimization, cheaper computation of derivatives, and a high convergence rate. However, in order to take advantage of these benefits in model predictive control (MPC) for legged robots, it is crucial to handle its large number of equality constraints efficiently. To accomplish this, we first (i) propose a novel approach to handle equality constraints based on nullspace parametrization. Our approach balances optimality, and both dynamics and equality-constraint feasibility appropriately, which increases the basin of attraction to good local minima. To do so, we then (ii) adapt our feasibility-driven search by incorporating a merit function. Furthermore, we introduce (iii) a condensed formulation of the inverse dynamics that considers arbitrary actuator models. We also develop (iv) a novel MPC based on inverse dynamics within a perception locomotion framework. Finally, we present (v) a theoretical comparison of optimal control with the forward and inverse dynamics, and evaluate both numerically. Our approach enables the first application of inverse-dynamics MPC on hardware, resulting in state-of-the-art dynamic climbing on the ANYmal robot. We benchmark it over a wide range of robotics problems and generate agile and complex maneuvers. We show the computational reduction of our nullspace resolution and condensed formulation (up to 47.3%). We provide evidence of the benefits of our approach by solving coarse optimization problems with a high convergence rate (up to 10 Hz of discretization). Our algorithm is publicly available inside CROCODDYL.Comment: 17 pages, 14 figures, under-revie

arXiv.org e-Print Archive

Heriot Watt Pure

Edinburgh Research Explorer

Learning Singularity Avoidance

Author: Howard Matthew
Manavalan Jeevan
Publication venue
Publication date: 25/03/2019
Field of study

With the increase in complexity of robotic systems and the rise in non-expert users, it can be assumed that task constraints are not explicitly known. In tasks where avoiding singularity is critical to its success, this paper provides an approach, especially for non-expert users, for the system to learn the constraints contained in a set of demonstrations, such that they can be used to optimise an autonomous controller to avoid singularity, without having to explicitly know the task constraints. The proposed approach avoids singularity, and thereby unpredictable behaviour when carrying out a task, by maximising the learnt manipulability throughout the motion of the constrained system, and is not limited to kinematic systems. Its benefits are demonstrated through comparisons with other control policies which show that the constrained manipulability of a system learnt through demonstration can be used to avoid singularities in cases where these other policies would fail. In the absence of the systems manipulability subject to a tasks constraints, the proposed approach can be used instead to infer these with results showing errors less than 10^-5 in 3DOF simulated systems as well as 10^-2 using a 7DOF real world robotic system

arXiv.org e-Print Archive

Crossref

King's Research Portal