We consider the problem of direct policy learning in situations where the policies are only observable through their projections into the null-space of a set of dynamic, non-linear task constraints. We tackle the issue of deriving consistent data for the learning of such policies and make two contributions towards its solution. Firstly, we derive the conditions required to exactly reconstruct null-space policies and suggest a learning strategy based on this derivation. Secondly, we consider the case that the null-space policy is conservative and show that such a policy can be learnt more easily and robustly by learning the underlying potential function and using this as our representation of the policy. 

Howard, M.

Vijayakumar, S.

English

We consider the problem of direct policy learning in situations where the policies are only observable through their projections into the null-space of a set of dynamic, non-linear task constraints. We tackle the issue of deriving consistent data for the learning of such policies and make two contributions towards its solution. Firstly, we derive the conditions required to exactly reconstruct null-space policies and suggest a learning strategy based on this derivation. Secondly, we consider the case that the null-space policy is conservative and show that such a policy can be learnt more easily and robustly by learning the underlying potential function and using this as our representation of the policy

Howard, Matthew

Vijayakumar, Sethu

King's Research Portal

Reconstructing Null-space Policies Subject to Dynamic Task Constraints in Redundant Manipulators

Edinburgh Research Explorer

     Edinburgh Research Explorer                                      Reconstructing null-space policies subject to dynamic taskconstraints in redundant manipulatorsCitation for published version:Howard, M & Vijayakumar, S 2007, 'Reconstructing null-space policies subject to dynamic task constraintsin redundant manipulators'. in Workshop on Robotics and Mathematics (ROBOMAT '07), Coimbra, Portugal.Link:Link to publication record in Edinburgh Research ExplorerDocument Version:Author final version (often known as postprint)Published In:Workshop on Robotics and Mathematics (ROBOMAT '07), Coimbra, PortugalGeneral rightsCopyright for the publications made accessible via the Edinburgh Research Explorer is retained by the author(s)and / or other copyright owners and it is a condition of accessing these publications that users recognise andabide by the legal requirements associated with these rights.Take down policyThe University of Edinburgh has made every reasonable effort to ensure that Edinburgh Research Explorercontent complies with UK legislation. If you believe that the public display of this file breaches copyright pleasecontact openaccess@ed.ac.uk providing details, and we will remove access to the work immediately andinvestigate your claim.Download date: 20. Feb. 2015Reconstructing Null-space Policies Subject toDynamic Task Constraints in RedundantManipulatorsMatthew Howard Sethu Vijayakumar∗August 28, 2007AbstractWe consider the problem of direct policy learning in situations where the policies are onlyobservable through their projections into the null-space of a set of dynamic, non-lineartask constraints. We tackle the issue of deriving consistent data for the learning of suchpolicies and make two contributions towards its solution. Firstly, we derive the conditionsrequired to exactly reconstruct null-space policies and suggest a learning strategy based onthis derivation. Secondly, we consider the case that the null-space policy is conservativeand show that such a policy can be learnt more easily and robustly by learning theunderlying potential function and using this as our representation of the policy.1 IntroductionRedundant manipulators are characterised as having degrees of freedom in excess of thoseneeded to perform some task. In the control of such systems a popular paradigm is toutilise redundancy through secondary movement policies that complement the primary taskgoals in some way. Such policies prefer actions that, for example, avoid joint limits [1],kinematic singularities [11] or self-collisions [9]. Traditionally these policies were implementedas optimising some carefully selected instantaneous cost function or potential. The approachwas first proposed by Lie´geois [3] in the context of Resolved Motion Rate Control (RMRC)[10], but has since been extended to other control regimes (notably force based control)and a wider variety of secondary policies. For example Nakamura [5] studied the problemextensively with particular emphasis on optimal RMRC and Resolved Acceleration Control(RAC) of arms with pre-defined tasks and time-integral cost functions.However, the secondary policy need not be the result of optimisation and the formalismextends to a variety of constraint-based control scenarios. For example, in humanoid robots, asecondary goal might be to maintain some posture when performing some task [2], to performmultiple prioritised tasks at once [8] or perform control subject to contact constraints [6].Furthermore, many tasks are best described as constrained with respect to certain variables.∗M. Howard and S. Vijayakumar are with the School of Informatics, University of Edinburgh, EdinburghEH9 3JZ, United Kingdom. E-mail: matthew.howard@ed.ac.uk.1Consider running on a treadmill: the centre of mass and tilt of the torso are constrained andthe policy controlling the gait is projected into the null-space of these constraints.The focus in this paper is on modelling secondary control policies from observations ofconstrained motion using statistical learning methods. This is a form of direct policy learn-ing, with the difference that the policy is only partially observable. For the learning, mostsupervised learning techniques require consistent, convex training data. In this paper, welook at the problem of deriving this data and offer two contributions for its solution.The first concerns the general case where the policy can be any non-conservative vectorfunction of the state. We take a geometric approach to reconstructing the policy based onEuclid’s theorem and outline the necessary conditions for exact reconstruction at any givenpoint. Furthermore, we show that this approach suggests an iterative training algorithm forour learner.The second contribution is to show that, by restricting the class of permissible policies tothose that optimise some potential, the policy can be inferred in a simpler and more robustway using a form of inverse optimal control. In this approach, identifying the unconstrainedpolicy from constrained observations, is equivalent to seeking the potential being optimised.We show that by modelling the policy through its potential we can side-step several of therestrictions of the geometric approach.Finally, we present experimental results for a simulated robot arm in which the recon-structed null-space policy is used to replace that of the original and the resultant behaviouris compared across a variety of consistent task goals.2 Problem FormulationWe consider control policies of the formu = utask(x, t) + unull(x, t) = utask(x, t) +N(x, t)a(x) (1)where u is the control signal, utask is the component of u that satisfies a set of non-linear, time-varying task constraints, a(x) is a policy pursuing secondary movement goals, and N(x, t)is a projection matrix. N(x, t) prevents violation of the task constraints by projecting thepolicy into the null-space. Our goal is to model a(x) from observations of unull.In general, a(x) can be any arbitrary vector field. According to the Helmholtz decompo-sition, any vector field may be comprised of rotational and divergent componentsa(x) = ∇x ×Φ(x) +∇xφ(x) (2)where Φ and φ are vector and scalar potentials. Assuming that a(x) is conservative1 anequivalent goal is to model φ(x). Policies of the form (1) occur in both velocity (u ≡ q˙) andforce (u ≡ τ ) based control [4].Example 2.1. Velocity-based ControlA standard velocity-based control scheme is RMRC [10, 3]q˙ = J(q, t)†r˙+N(q, t)a (3)1A necessary and sufficient condition for this is that ∇x × a(x) = 0,∀x.2Start Positiona(x) = a1(x)a(x) = a1(x)τ 1τ 1FFτ 3τ 2τ 2 τ 3Figure 1: Effect of different null-space policies a(x) on behaviour when tracking a linear task-space trajectory in RMRC (left) and applying a force F to a mass in force control (right).where r, r˙ ∈ Rk and q, q˙ ∈ Rn, denote the task- and joint-space positions and velocities andJ(q, t) is the Jacobian with W-weighted pseudoinverse J† =W−1JT (JW−1JT )−1. N(q, t) =(I− J†(q, t)J(q, t)) is the null-space projection matrix (where I is the identity matrix). Notethat, in general, the Jacobian and the projection matrix are time-dependent reflecting the factthat the task-space may change in time [1].Example 2.2. Force-based ControlA general formulation for force-based control is [7]τ =W−1/2(AM−1W−1/2)†(b−AM−1F) +N(q, q˙, t)a (4)where τ ∈ Rn is the applied torque/force, q, q˙, q¨ ∈ Rn are joint-space positions, velocitiesand accelerations, M(q) ∈ Rn×n is an inertia/mass matrix and F ∈ Rn describes per-turbing forces such as centrifugal, Coriolis and gravity forces. The weighting matrix W ∈Rn×n determines the control paradigm used, such as RAC (W =M−2) or the OperationalSpace Formulation (W =M−1) [7]. The task is described through constraints of the formA(q, q˙, t)q¨ = b(q, q˙, t) ∈ Rk and the null-space projection matrix is given by N(q, q˙, t) =W−1/2(I− (AM−1W−1/2)†AM−1W−1/2)W1/2.The correspondence between (1), (3) and (4) can easily be shown by appropriate substitutionof variables. In both cases, the second term arises when there is redundancy, i.e. the taskdimensionality is lower than that of the action space (k < n), allowing secondary goals to bepursued. Fig. 1 shows examples of how different null-space policies affect behaviour.3 Reconstructing Nullspace PoliciesTheorem 3.1. Reconstruction of Projected PoliciesGiven observations a(i) = N(i)(x)a(x), i = 1, . . . , n of a policy a(x) projected into the null-space of a set of n task constraints which that span the action space, the unconstrained policyis given bya(x) = x× − x (5)3q1qa(q)q2q˙r′ =α′T qJ†r˙′Naq1q2a(q)r(3) =α(3),T qr(1) =α(1),T qr(2) =α(2),Tqr (4) = α(4),Tqr (5)=α (5),Tqr (6)=α (6),Tqq1q2r ′′a′a(q)a′′q×r′a′⊥a′′⊥qFigure 2: Under the task constraints (7), the null-space policy is projected onto a manifold r =α(t)Tq (left), orthogonal to the task space motion. Under multiple constraints the projectedpolicy vectors lie inscribed in a hypersphere in state-space (centre). Euclid’s Theorem can beused to reconstruct a(q) given observations under different constraints (right).where x× is the solution to the linear systemAx× = d (6)where A ≡ (a′, . . . ,a(m))T and the elements of d are given by di = a(i)T (x+ a(i)).Proof Consider the RMRC control of a manipulator with two-dimensional joint space, q ≡(q1, q2)T , and one-dimensional task space r(i), i = 1, . . . , n. The Jacobian of this systemJ(i)(q) = (α1, α2)(i) = α(i) (7)is locally linear in the region of q. Under task constraint i the null-space policy is constrainedto a line in joint-space with intersection r(i) (Fig. 2, left). When the active constraint changesthe rotation of this line changes so that the observed projections lie inscribed within a circle(hypersphere in n-d space) of diameter ||a(q)|| (Fig. 2, centre). Euclid’s theorem states thatany triangle inscribed in a semi-circle is a right-angle triangle. Hence a(q) is given by theintersection of the lines orthogonal to any two projections a′,a′′ (Fig. 2, right). By the sameargument, in n-dimensional space, if observations are such that they form a basis set of thespace, we can construct planes normal to the projections and solve for the intersection pointq×. This yields the linear system (6) with the unprojected vector given by (5). ¤Theorem 3.1 also suggests the following lemma.Lemma 3.1. Given observations a(i) = N(i)(x)a(x), i = 1, . . . , n of a constrained policya(x), the observation with the largest norm ||a(i)|| lies closest to the unconstrained policy.Proof By inspection of Fig. 2, or by considering that N(x, t) is a projection matrix, with keigenvalues of value 0 and n− k eigenvalues of value 1. Fewer constraints (smaller k) resultsin larger norms. ¤Lemma 3.1 suggests an iterative approach to training whereby if multiple observations aremade around the same point, those with the largest norm should be used for learning. Thisis particularly true of highly redundant systems (k << n) where there the policy is much less4constrained. Furthermore, in the limit that observations are made under a single, constantconstraint, a consistent policy unull(x) = N(x)a(x) will be learnt.The condition in Theorem 3.1 that a spanning set of projections are required to exactlyreconstruct the policy is somewhat restrictive, and in real data sets unlikely. However if thepolicy is conservative (i.e. the first term of (2) is zero) we can side-step these restrictions withthe following proposition.Proposition 3.1. Reconstruction of Conservative PoliciesUnder the same conditions as Theorem 3.1, a conservative policy a(x) can be represented by itsunderlying potential function, which can be learnt without the need for multiple observationsor iterative training.Consider again the case of RMRC of a redundant manipulator. The potential underlyinga conservative a(q) can be reconstructed through inverse optimal control [4]. The sim-plest method requires trajectories sampled at some rate ρ resulting in a set of via-points(q1 . . .qρτ )T whereqt+1 = qt +N(qt)∇qφ(qt) (8)for a trajectory of duration τ and utask = 0. Training samples of φ(x) can be generated byintegrating along trajectories using, for example, the Euler methodφ(qt+1) = φ(qt) + (qt+1 − qt)TN(qt)∇qφ(qt). (9)The key observation is that the integration in (9) occurs in the direction locally orthogonal tothe constraints. We refer the reader to results reported in [4] for empirical evidence supportingProposition 3.1.In Fig. 3 the left-hand plot shows the true (blue) and reconstructed (cyan) potential alongtrajectories under a variety of constraints. Contours show the true (quadratic) potential func-tion over two of the joints of the arm. The trajectories are reconstructed up to a translationin the φ-dimension (trajectories have been translated in Fig. 3 for comparison). In the middleand right-hand plots a modified Euler method was used to learn two policies; that derivedfrom a quadratic potential (top row) and a sinusoidal one (bottom row); The middle plotsshow the true and reconstructed policy subject to constraints on the hand. The right-handplots show a time-lapse of the arm tracking a linear trajectory using the true and learnt policyin the null-space.4 ConclusionWe have presented the mathematical basis for direct policy learning of policies subject todynamic, non-linear constraints. We have shown that in the general case of non-conservativepolicies exact reconstruction of the policy requires solution of a system of equations con-structed from observations under task constraints that span the state-space. We have notedthat this suggests an iterative training scheme based on the norm of observed projections.Finally, we have suggested a more robust approach to learning conservative policies throughnumerical integration techniques and simulation results have been presented for the learningof such policies for a kinematically-controlled three link arm.5−3−2−10123−3−2−10123−1.4−1.2−1−0.8−0.6−0.4−0.20−3 −2 −1 0 1 2 3−3−2−10123q2 (radians)q 3 (radians)pi−3 −2 −1 0 1 2 3−3−2−10123time lapse−3 −2 −1 0 1 2 3−3−2−10123time lapse−3 −2 −1 0 1 2 3−3−2−10123q2 (radians)q 3 (radians)pi−3 −2 −1 0 1 2 3−3−2−10123time lapse−3 −2 −1 0 1 2 3−3−2−10123time lapseFigure 3: True (blue) and reconstructed (cyan) values of the quadratic potential (contours)along trajectories subject to different constraints (constraints on the hand, wrist, elbow, andunconstrained trajectories shown). True (black) and learnt (red) null-space policies subjectto hand constraints for the quadratic (top) and sinusoidal (bottom) potentials. Time-lapse ofthe arm tracking a linear trajectory using the true (left) and learnt (right) null-space policies.References[1] M. Gienger, H. Janssen, and C. Goerick. Task-oriented whole body motion for humanoid robots.In IEEE-RAS Int. Conf. on Humanoid Robots, 2005.[2] O. Khatib, J. Warren, V. De Sapio, and L. Sentis. Human-like motion from physiologically-basedpotential energies. In J. Lenarcic and C. Galletti, editors, On Advances in Robot Kinematics.Kluwer Academic Publishers, 2004.[3] A. Lie´geois. Automatic supervisory control of the configuration and behavior of multibody mech-anisms. In IEEE Trans. Syst., Man, Cybern., volume 7, 1977.[4] Howard M., M. Gienger, C. Goerick, and S. Vijayakumar. Learning utility surfaces for movementselection. In IEEE Int. Conf. on Robotics and Biomimetics (ROBIO), 2006.[5] Y. Nakamura. Advanced Robotics: Redundancy and Optimization. Addison Wesley, Reading, MA,1991.[6] J. Park and O. Khatib. Contact consistent control framework for humanoid robots. In IEEE Int.Conf. on Robotics and Automation (ICRA), 2006.[7] J. Peters, M. Mistry, F. Udwadia, R. Cory, J. Nakanishi, and S. Schaal. A unifying methodologyfor the control of robotic systems. In IEEE/RSJ Int. Conf. on Intelligent Robots and Systems(IROS), 2005.[8] L. Sentis and O. Khatib. A whole-body control framework for humanoids operating in humanenvironments. In IEEE Int. Conf. on Robotics and Automation (ICRA), 2006.[9] H. Sugiura, M. Gienger, H. Janssen, and C. Goerick. Real-time self collision avoidance for hu-manoids by means of nullspace criteria and task intervals. In IEEE-RAS Int. Conf. on HumanoidRobots, 2006.[10] D. E. Whitney. Resolved motion rate control of manipulators and human prostheses. 10(22),1969.[11] T. Yoshikawa. Manipulability of robotic mechanisms. Int. J. Robotics Research, 4(2), 1985.6

A unifying methodology for the control of robotic systems.

A whole-body control framework for humanoids operating in human environments.

Advanced Robotics: Redundancy and Optimization.

Contact consistent control framework for humanoid robots.

egeois. Automatic supervisory control of the con¯guration and behavior of multibody mechanisms.

Human-like motion from physiologically-based potential energies.

Learning utility surfaces for movement selection.

Manipulability of robotic mechanisms.

Real-time self collision avoidance for humanoids by means of nullspace criteria and task intervals.

Resolved motion rate control of manipulators and human prostheses.

Task-oriented whole body motion for humanoid robots.

Reconstructing null-space policies subject to dynamic task constraints in redundant manipulators

Matthew Howard

Sethu Vijayakumar

CiteSeerX

http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.99.9509

Reconstructing null-space policies subject to dynamic task constraints in redundant manipulators

Abstract

Similar works

Full text

Available Versions

King's Research Portal

Edinburgh Research Explorer

CiteSeerX

CiteSeerX