5,232 research outputs found
Robotic Manipulation Datasets for Offline Compositional Reinforcement Learning
Offline reinforcement learning (RL) is a promising direction that allows RL
agents to pre-train on large datasets, avoiding the recurrence of expensive
data collection. To advance the field, it is crucial to generate large-scale
datasets. Compositional RL is particularly appealing for generating such large
datasets, since 1) it permits creating many tasks from few components, 2) the
task structure may enable trained agents to solve new tasks by combining
relevant learned components, and 3) the compositional dimensions provide a
notion of task relatedness. This paper provides four offline RL datasets for
simulated robotic manipulation created using the 256 tasks from CompoSuite
[Mendez et al., 2022a]. Each dataset is collected from an agent with a
different degree of performance, and consists of 256 million transitions. We
provide training and evaluation settings for assessing an agent's ability to
learn compositional task policies. Our benchmarking experiments on each setting
show that current offline RL methods can learn the training tasks to some
extent and that compositional methods significantly outperform
non-compositional methods. However, current methods are still unable to extract
the tasks' compositional structure to generalize to unseen tasks, showing a
need for further research in offline compositional RL
Compositional Servoing by Recombining Demonstrations
Learning-based manipulation policies from image inputs often show weak task
transfer capabilities. In contrast, visual servoing methods allow efficient
task transfer in high-precision scenarios while requiring only a few
demonstrations. In this work, we present a framework that formulates the visual
servoing task as graph traversal. Our method not only extends the robustness of
visual servoing, but also enables multitask capability based on a few
task-specific demonstrations. We construct demonstration graphs by splitting
existing demonstrations and recombining them. In order to traverse the
demonstration graph in the inference case, we utilize a similarity function
that helps select the best demonstration for a specific task. This enables us
to compute the shortest path through the graph. Ultimately, we show that
recombining demonstrations leads to higher task-respective success. We present
extensive simulation and real-world experimental results that demonstrate the
efficacy of our approach.Comment: http://compservo.cs.uni-freiburg.d
RSG: Fast Learning Adaptive Skills for Quadruped Robots by Skill Graph
Developing robotic intelligent systems that can adapt quickly to unseen wild
situations is one of the critical challenges in pursuing autonomous robotics.
Although some impressive progress has been made in walking stability and skill
learning in the field of legged robots, their ability to fast adaptation is
still inferior to that of animals in nature. Animals are born with massive
skills needed to survive, and can quickly acquire new ones, by composing
fundamental skills with limited experience. Inspired by this, we propose a
novel framework, named Robot Skill Graph (RSG) for organizing massive
fundamental skills of robots and dexterously reusing them for fast adaptation.
Bearing a structure similar to the Knowledge Graph (KG), RSG is composed of
massive dynamic behavioral skills instead of static knowledge in KG and enables
discovering implicit relations that exist in be-tween of learning context and
acquired skills of robots, serving as a starting point for understanding subtle
patterns existing in robots' skill learning. Extensive experimental results
demonstrate that RSG can provide rational skill inference upon new tasks and
environments and enable quadruped robots to adapt to new scenarios and learn
new skills rapidly
Self-Play and Self-Describe: Policy Adaptation with Vision-Language Foundation Models
Recent progress on vision-language foundation models have brought significant
advancement to building general-purpose robots. By using the pre-trained models
to encode the scene and instructions as inputs for decision making, the
instruction-conditioned policy can generalize across different objects and
tasks. While this is encouraging, the policy still fails in most cases given an
unseen task or environment. To adapt the policy to unseen tasks and
environments, we explore a new paradigm on leveraging the pre-trained
foundation models with Self-PLAY and Self-Describe (SPLAYD). When deploying the
trained policy to a new task or a new environment, we first let the policy
self-play with randomly generated instructions to record the demonstrations.
While the execution could be wrong, we can use the pre-trained foundation
models to accurately self-describe (i.e., re-label or classify) the
demonstrations. This automatically provides new pairs of
demonstration-instruction data for policy fine-tuning. We evaluate our method
on a broad range of experiments with the focus on generalization on unseen
objects, unseen tasks, unseen environments, and sim-to-real transfer. We show
SPLAYD improves baselines by a large margin in all cases. Our project page is
available at https://geyuying.github.io/SPLAYD/Comment: Project page: https://geyuying.github.io/SPLAYD
Modular Deep Learning
Transfer learning has recently become the dominant paradigm of machine
learning. Pre-trained models fine-tuned for downstream tasks achieve better
performance with fewer labelled examples. Nonetheless, it remains unclear how
to develop models that specialise towards multiple tasks without incurring
negative interference and that generalise systematically to non-identically
distributed tasks. Modular deep learning has emerged as a promising solution to
these challenges. In this framework, units of computation are often implemented
as autonomous parameter-efficient modules. Information is conditionally routed
to a subset of modules and subsequently aggregated. These properties enable
positive transfer and systematic generalisation by separating computation from
routing and updating modules locally. We offer a survey of modular
architectures, providing a unified view over several threads of research that
evolved independently in the scientific literature. Moreover, we explore
various additional purposes of modularity, including scaling language models,
causal inference, programme induction, and planning in reinforcement learning.
Finally, we report various concrete applications where modularity has been
successfully deployed such as cross-lingual and cross-modal knowledge transfer.
Related talks and projects to this survey, are available at
https://www.modulardeeplearning.com/
A formal methods approach to interpretability, safety and composability for reinforcement learning
Robotic systems that are capable of learning from experience have recently become more common place. These systems have demonstrated success in learning difficult control tasks. However, as tasks become more complex and the number of options to reason about becomes greater, there is an increasing need to be able to specify the desired behavior in a structured and interpretable fashion, guarantee system safety, conveniently integrate task specific knowledge with more general knowledge about the world and generate new skills from learned ones without additional exploration. This thesis addresses these problems specifically in the case of reinforcement learning (RL) by using techniques from formal methods.
Experience and prior knowledge shape the way humans make decisions when asked to perform complex tasks. Conversely, robots have had difficulty incorporating a rich set of prior knowledge when solving complex planning and control problems. In RL, the reward offers an avenue for incorporating prior knowledge. However, incorporating such knowledge is not always straightforward using standard reward engineering techniques. This thesis presents a formal specification language that can combine a base of general knowledge with task specifications to generate richer task descriptions. For example, to make a hotdog at the task level, one needs to grab a sausage, grill it, place the cooked sausage in a bun, apply ketchup, and serve. Prior knowledge about the context of the task, e.g., sausages can be damaged if squeezed too hard, should also be taken into account.
Interpretability in RL rewards - easily understanding what the reward function represents and knowing how to improve it - is a key component in understanding the behavior of an RL agent. This property is often missing in reward engineering techniques, which makes it difficult to understand exactly what the implications of the reward function are when tasks become complex. Interpretability of the reward allows for better value alignment between human intent and system objectives, leading to a lower likelihood of reward hacking by the system. The formal specification language presented in this work has the added benefit of being easily interpretable for its similarity with natural language.
Safe RL - guaranteeing undesirable behaviors do not occur (i.e. collisions with obstacles), is a critical concern when learning and deployment of robotic systems happen in the real world. Safety for these systems not only presents legal challenges to their wide adoption, but also raises risks to hardware and users. By using techniques from formal methods and control theory, we provide two main components to ensure safety in the RL agent behaviors. First, the formal specification language allows for explicit definition of undesirable behaviors (e.g. always avoid collisions). Second, control barrier functions (CBF) are used to enforce these safety constraints.
Composability of learned skills - the ability to compose new skills from a library of learned ones can significantly enhance a robot's capabilities by making efficient use of past experience. Modern RL systems focus mainly on mastery (maximizing the given reward) and less on generalization (transfer from one task domain to another). In this thesis, we will also exploit the logical and graphical representations of the task specification and develop techniques for skill composition
- …