3 research outputs found
Goal-Conditioned End-to-End Visuomotor Control for Versatile Skill Primitives
Visuomotor control (VMC) is an effective means of achieving basic
manipulation tasks such as pushing or pick-and-place from raw images.
Conditioning VMC on desired goal states is a promising way of achieving
versatile skill primitives. However, common conditioning schemes either rely on
task-specific fine tuning - e.g. using one-shot imitation learning (IL) - or on
sampling approaches using a forward model of scene dynamics i.e.
model-predictive control (MPC), leaving deployability and planning horizon
severely limited. In this paper we propose a conditioning scheme which avoids
these pitfalls by learning the controller and its conditioning in an end-to-end
manner. Our model predicts complex action sequences based directly on a dynamic
image representation of the robot motion and the distance to a given target
observation. In contrast to related works, this enables our approach to
efficiently perform complex manipulation tasks from raw image observations
without predefined control primitives or test time demonstrations. We report
significant improvements in task success over representative MPC and IL
baselines. We also demonstrate our model's generalisation capabilities in
challenging, unseen tasks featuring visual noise, cluttered scenes and unseen
object geometries.Comment: revised manuscript with additional baselines and generalisation
experiments; 11 pages, 8 figures, 7 table
From visuomotor control to latent space planning for robot manipulation
Deep visuomotor control is emerging as an active research area for robot manipulation. Recent advances in learning sensory and motor systems in an end-to-end manner have achieved remarkable performance across a range of complex tasks. Nevertheless, a few limitations restrict visuomotor control from being more widely adopted as the de facto choice when facing a manipulation task on a real robotic platform. First, imitation learning-based visuomotor control approaches tend to suffer from the inability to recover from an out-of-distribution state caused by compounding errors. Second, the lack of versatility in task definition limits skill generalisability. Finally, the training data acquisition process and domain transfer are often impractical. In this thesis, individual solutions are proposed to address each of these issues.
In the first part, we find policy uncertainty to be an effective indicator of potential failure cases, in which the robot is stuck in out-of-distribution states. On this basis, we introduce a novel uncertainty-based approach to detect potential failure cases and a recovery strategy based on action-conditioned uncertainty predictions. Then, we propose to employ visual dynamics approximation to our model architecture to capture the motion of the robot arm instead of the static scene background, making it possible to learn versatile skill primitives. In the second part, taking inspiration from the recent progress in latent space planning, we propose a gradient-based optimisation method operating within the latent space of a deep generative model for motion planning. Our approach bypasses the traditional computational challenges encountered by established planning algorithms, and has the capability to specify novel constraints easily and handle multiple constraints simultaneously. Moreover, the training data comes from simple random motor-babbling of kinematically feasible robot states. Our real-world experiments further illustrate that our latent space planning approach can handle both open and closed-loop planning in challenging environments such as heavily cluttered or dynamic scenes. This leads to the first, to our knowledge, closed-loop motion planning algorithm that can incorporate novel custom constraints, and lays the foundation for more complex manipulation tasks
Goal-conditioned end-to-end visuomotor control for versatile skill primitives
Visuomotor control (VMC) is an effective means of achieving basic manipulation tasks such as pushing or pick- and-place from raw images. Conditioning VMC on desired goal states is a promising way of achieving versatile skill primitives. However, common conditioning schemes either rely on task-specific fine tuning - e.g. using one-shot imitation learning (IL) - or on sampling approaches using a forward model of scene dynamics i.e. model-predictive control (MPC), leaving deployability and planning horizon severely limited. In this paper we propose a conditioning scheme which avoids these pitfalls by learning the controller and its conditioning in an end-to-end manner. Our model predicts complex action sequences based directly on a dynamic image representation of the robot motion and the distance to a given target observation. In contrast to related works, this enables our approach to efficiently perform complex manipulation tasks from raw image observations without predefined control primitives or test time demonstrations. We report significant improvements in task success over representative MPC and IL baselines. We also demonstrate our model's generalisation capabilities in challenging, unseen tasks featuring visual noise, cluttered scenes and unseen object geometries