7 research outputs found
Structured machine learning models for robustness against different factors of variability in robot control
An important feature of human sensorimotor skill is our ability to learn to reuse them across different environmental contexts, in part due to our understanding of attributes of variability in these environments. This thesis explores how the structure of models used within learning for robot control could similarly help autonomous robots cope with variability, hence achieving skill generalisation. The overarching approach is to develop modular architectures that judiciously combine different forms of inductive bias for learning. In particular, we consider how models and policies should be structured in order to achieve robust behaviour in the face of different factors of variation - in the environment, in objects and in other internal parameters of a policy - with the end goal of more robust, accurate and data-efficient skill acquisition and adaptation.
At a high level, variability in skill is determined by variations in constraints presented by the external environment, and in task-specific perturbations that affect the specification of optimal action. A typical example of environmental perturbation would be variation in lighting and illumination, affecting the noise characteristics of perception. An example of task perturbations would be variation in object geometry, mass or friction, and in the specification of costs associated with speed or smoothness of execution. We counteract these factors of variation by exploring three forms of structuring: utilising separate data sets curated according to the relevant factor of variation, building neural network models that incorporate this factorisation into the very structure of the networks, and learning structured loss functions. The thesis is comprised of four projects exploring this theme within robotics planning and prediction tasks.
Firstly, in the setting of trajectory prediction in crowded scenes, we explore a modular architecture for learning static and dynamic environmental structure. We show that factorising the prediction problem from the individual representations allows for robust and label efficient forward modelling, and relaxes the need for full model re-training in new environments. This modularity explicitly allows for a more flexible and interpretable adaptation of trajectory prediction models to using
pre-trained state of the art models. We show that this results in more efficient motion prediction and allows for performance comparable to the state-of-the-art supervised 2D trajectory prediction.
Next, in the domain of contact-rich robotic manipulation, we consider a modular architecture that combines model-free learning from demonstration, in particular dynamic movement primitives (DMP), with modern model-free reinforcement learning (RL), using both on-policy and off-policy approaches. We show that factorising the skill learning problem to skill acquisition and error correction through policy adaptation strategies such as residual learning can help improve the overall performance of policies in the context of contact-rich manipulation. Our empirical evaluation demonstrates how to best do this with DMPs and propose “residual Learning from Demonstration“ (rLfD), a framework that combines DMPs with RL to learn a residual correction policy. Our evaluations, performed both in simulation and on a physical system, suggest that applying residual learning directly in task space and operating on the full pose of the robot can significantly improve the overall performance of DMPs. We show that rLfD offers a gentle to the joints solution that improves the task success and generalisation of DMPs. Last but not least, our study shows that the extracted correction policies can be transferred to different geometries and frictions through few-shot task adaptation.
Third, we employ meta learning to learn time-invariant reward functions, wherein both the objectives of a task (i.e., the reward functions) and the policy for performing that task optimally are learnt simultaneously. We propose a novel inverse reinforcement learning (IRL) formulation that allows us to 1) vary the length of execution by learning time-invariant costs, and 2) relax the temporal alignment requirements for learning from demonstration. We apply our method to two different types of cost formulations and evaluate their performance in the context of learning reward functions for simulated placement and peg in hole tasks executed on a 7DoF Kuka IIWA arm. Our results show that our approach enables learning temporally invariant rewards from misaligned demonstration that can also generalise spatially to out of distribution tasks.
Finally, we employ our observations to evaluate adversarial robustness in the context of transfer learning from a source trained on CIFAR 100 to a target network trained on CIFAR 10. Specifically, we study the effects of using robust optimisation in the source and target networks. This allows us to identify transfer learning strategies under which adversarial defences are successfully retained, in addition to revealing potential vulnerabilities. We study the extent to which adversarially robust features can preserve their defence properties against black and white-box attacks under three different transfer learning strategies. Our empirical evaluations give insights on how well adversarial robustness under transfer learning can generalise.
Learning Structured Representations of Spatial and Interactive Dynamics for Trajectory Prediction in Crowded Scenes
Context plays a significant role in the generation of motion for dynamic
agents in interactive environments. This work proposes a modular method that
utilises a learned model of the environment for motion prediction. This
modularity explicitly allows for unsupervised adaptation of trajectory
prediction models to unseen environments and new tasks by relying on unlabelled
image data only. We model both the spatial and dynamic aspects of a given
environment alongside the per agent motions. This results in more informed
motion prediction and allows for performance comparable to the
state-of-the-art. We highlight the model's prediction capability using a
benchmark pedestrian prediction problem and a robot manipulation task and show
that we can transfer the predictor across these tasks in a completely
unsupervised way. The proposed approach allows for robust and label efficient
forward modelling, and relaxes the need for full model re-training in new
environments
An empirical evaluation of adversarial robustness under transfer learning
In this work, we evaluate adversarial robustness in the context of transfer
learning from a source trained on CIFAR 100 to a target network trained on
CIFAR 10. Specifically, we study the effects of using robust optimisation in
the source and target networks. This allows us to identify transfer learning
strategies under which adversarial defences are successfully retained, in
addition to revealing potential vulnerabilities. We study the extent to which
features learnt by a fast gradient sign method (FGSM) and its iterative
alternative (PGD) can preserve their defence properties against black and
white-box attacks under three different transfer learning strategies. We find
that using PGD examples during training on the source task leads to more
general robust features that are easier to transfer. Furthermore, under
successful transfer, it achieves 5.2% more accuracy against white-box PGD
attacks than suitable baselines. Overall, our empirical evaluations give
insights on how well adversarial robustness under transfer learning can
generalise
Model-Based Inverse Reinforcement Learning from Visual Demonstrations
Scaling model-based inverse reinforcement learning (IRL) to real robotic
manipulation tasks with unknown dynamics remains an open problem. The key
challenges lie in learning good dynamics models, developing algorithms that
scale to high-dimensional state-spaces and being able to learn from both visual
and proprioceptive demonstrations. In this work, we present a gradient-based
inverse reinforcement learning framework that utilizes a pre-trained visual
dynamics model to learn cost functions when given only visual human
demonstrations. The learned cost functions are then used to reproduce the
demonstrated behavior via visual model predictive control. We evaluate our
framework on hardware on two basic object manipulation tasks.Comment: Accepted at the 4th Conference on Robotic Learning (CoRL 2020),
Cambridge MA, US
Vid2Param: Modelling of Dynamics Parameters from Video
Videos provide a rich source of information, but it is generally hard to
extract dynamical parameters of interest. Inferring those parameters from a
video stream would be beneficial for physical reasoning. Robots performing
tasks in dynamic environments would benefit greatly from understanding the
underlying environment motion, in order to make future predictions and to
synthesize effective control policies that use this inductive bias. Online
physical reasoning is therefore a fundamental requirement for robust autonomous
agents. When the dynamics involves multiple modes (due to contacts or
interactions between objects) and sensing must proceed directly from a rich
sensory stream such as video, then traditional methods for system
identification may not be well suited. We propose an approach wherein fast
parameter estimation can be achieved directly from video. We integrate a
physically based dynamics model with a recurrent variational autoencoder, by
introducing an additional loss to enforce desired constraints. The model, which
we call Vid2Param, can be trained entirely in simulation, in an end-to-end
manner with domain randomization, to perform online system identification, and
make probabilistic forward predictions of parameters of interest. This enables
the resulting model to encode parameters such as position, velocity,
restitution, air drag and other physical properties of the system. We
illustrate the utility of this in physical experiments wherein a PR2 robot with
a velocity constrained arm must intercept an unknown bouncing ball with partly
occluded vision, by estimating the physical parameters of this ball directly
from the video trace after the ball is released.Comment: Accepted as a journal paper at IEEE Robotics and Automation Letters
(RA-L
Residual Learning from Demonstration: Adapting DMPs for Contact-rich Manipulation
Contacts and friction are inherent to nearly all robotic manipulation tasks.
Through the motor skill of insertion, we study how robots can learn to cope
when these attributes play a salient role. In this work we propose residual
learning from demonstration (rLfD), a framework that combines dynamic movement
primitives (DMP) that rely on behavioural cloning with a reinforcement learning
(RL) based residual correction policy. The proposed solution is applied
directly in task space and operates on the full pose of the robot. We show that
rLfD outperforms alternatives and improves the generalisation abilities of
DMPs. We evaluate this approach by training an agent to successfully perform
both simulated and real world insertions of pegs, gears and plugs into
respective sockets
RoboCat: A Self-Improving Foundation Agent for Robotic Manipulation
The ability to leverage heterogeneous robotic experience from different
robots and tasks to quickly master novel skills and embodiments has the
potential to transform robot learning. Inspired by recent advances in
foundation models for vision and language, we propose a foundation agent for
robotic manipulation. This agent, named RoboCat, is a visual goal-conditioned
decision transformer capable of consuming multi-embodiment action-labelled
visual experience. This data spans a large repertoire of motor control skills
from simulated and real robotic arms with varying sets of observations and
actions. With RoboCat, we demonstrate the ability to generalise to new tasks
and robots, both zero-shot as well as through adaptation using only 100--1000
examples for the target task. We also show how a trained model itself can be
used to generate data for subsequent training iterations, thus providing a
basic building block for an autonomous improvement loop. We investigate the
agent's capabilities, with large-scale evaluations both in simulation and on
three different real robot embodiments. We find that as we grow and diversify
its training data, RoboCat not only shows signs of cross-task transfer, but
also becomes more efficient at adapting to new tasks