583 research outputs found
From visuomotor control to latent space planning for robot manipulation
Deep visuomotor control is emerging as an active research area for robot manipulation. Recent advances in learning sensory and motor systems in an end-to-end manner have achieved remarkable performance across a range of complex tasks. Nevertheless, a few limitations restrict visuomotor control from being more widely adopted as the de facto choice when facing a manipulation task on a real robotic platform. First, imitation learning-based visuomotor control approaches tend to suffer from the inability to recover from an out-of-distribution state caused by compounding errors. Second, the lack of versatility in task definition limits skill generalisability. Finally, the training data acquisition process and domain transfer are often impractical. In this thesis, individual solutions are proposed to address each of these issues.
In the first part, we find policy uncertainty to be an effective indicator of potential failure cases, in which the robot is stuck in out-of-distribution states. On this basis, we introduce a novel uncertainty-based approach to detect potential failure cases and a recovery strategy based on action-conditioned uncertainty predictions. Then, we propose to employ visual dynamics approximation to our model architecture to capture the motion of the robot arm instead of the static scene background, making it possible to learn versatile skill primitives. In the second part, taking inspiration from the recent progress in latent space planning, we propose a gradient-based optimisation method operating within the latent space of a deep generative model for motion planning. Our approach bypasses the traditional computational challenges encountered by established planning algorithms, and has the capability to specify novel constraints easily and handle multiple constraints simultaneously. Moreover, the training data comes from simple random motor-babbling of kinematically feasible robot states. Our real-world experiments further illustrate that our latent space planning approach can handle both open and closed-loop planning in challenging environments such as heavily cluttered or dynamic scenes. This leads to the first, to our knowledge, closed-loop motion planning algorithm that can incorporate novel custom constraints, and lays the foundation for more complex manipulation tasks
Grounding Language with Visual Affordances over Unstructured Data
Recent works have shown that Large Language Models (LLMs) can be applied to
ground natural language to a wide variety of robot skills. However, in
practice, learning multi-task, language-conditioned robotic skills typically
requires large-scale data collection and frequent human intervention to reset
the environment or help correcting the current policies. In this work, we
propose a novel approach to efficiently learn general-purpose
language-conditioned robot skills from unstructured, offline and reset-free
data in the real world by exploiting a self-supervised visuo-lingual affordance
model, which requires annotating as little as 1% of the total data with
language. We evaluate our method in extensive experiments both in simulated and
real-world robotic tasks, achieving state-of-the-art performance on the
challenging CALVIN benchmark and learning over 25 distinct visuomotor
manipulation tasks with a single policy in the real world. We find that when
paired with LLMs to break down abstract natural language instructions into
subgoals via few-shot prompting, our method is capable of completing
long-horizon, multi-tier tasks in the real world, while requiring an order of
magnitude less data than previous approaches. Code and videos are available at
http://hulc2.cs.uni-freiburg.deComment: Project website: http://hulc2.cs.uni-freiburg.d
Compositional Servoing by Recombining Demonstrations
Learning-based manipulation policies from image inputs often show weak task
transfer capabilities. In contrast, visual servoing methods allow efficient
task transfer in high-precision scenarios while requiring only a few
demonstrations. In this work, we present a framework that formulates the visual
servoing task as graph traversal. Our method not only extends the robustness of
visual servoing, but also enables multitask capability based on a few
task-specific demonstrations. We construct demonstration graphs by splitting
existing demonstrations and recombining them. In order to traverse the
demonstration graph in the inference case, we utilize a similarity function
that helps select the best demonstration for a specific task. This enables us
to compute the shortest path through the graph. Ultimately, we show that
recombining demonstrations leads to higher task-respective success. We present
extensive simulation and real-world experimental results that demonstrate the
efficacy of our approach.Comment: http://compservo.cs.uni-freiburg.d
An active inference model of hierarchical action understanding, learning and imitation
We advance a novel active inference model of the cognitive processing that underlies the acquisition of a hierarchical action repertoire and its use for observation, understanding and imitation. We illustrate the model in four simulations of a tennis learner who observes a teacher performing tennis shots, forms hierarchical representations of the observed actions, and imitates them. Our simulations show that the agent's oculomotor activity implements an active information sampling strategy that permits inferring the kinematic aspects of the observed movement, which lie at the lowest level of the action hierarchy. In turn, this low-level kinematic inference supports higher-level inferences about deeper aspects of the observed actions: proximal goals and intentions. Finally, the inferred action representations can steer imitative responses, but interfere with the execution of different actions. Our simulations show that hierarchical active inference provides a unified account of action observation, understanding, learning and imitation and helps explain the neurobiological underpinnings of visuomotor cognition, including the multiple routes for action understanding in the dorsal and ventral streams and mirror mechanisms
- …