5,232 research outputs found

    Robotic Manipulation Datasets for Offline Compositional Reinforcement Learning

    Full text link
    Offline reinforcement learning (RL) is a promising direction that allows RL agents to pre-train on large datasets, avoiding the recurrence of expensive data collection. To advance the field, it is crucial to generate large-scale datasets. Compositional RL is particularly appealing for generating such large datasets, since 1) it permits creating many tasks from few components, 2) the task structure may enable trained agents to solve new tasks by combining relevant learned components, and 3) the compositional dimensions provide a notion of task relatedness. This paper provides four offline RL datasets for simulated robotic manipulation created using the 256 tasks from CompoSuite [Mendez et al., 2022a]. Each dataset is collected from an agent with a different degree of performance, and consists of 256 million transitions. We provide training and evaluation settings for assessing an agent's ability to learn compositional task policies. Our benchmarking experiments on each setting show that current offline RL methods can learn the training tasks to some extent and that compositional methods significantly outperform non-compositional methods. However, current methods are still unable to extract the tasks' compositional structure to generalize to unseen tasks, showing a need for further research in offline compositional RL

    Compositional Servoing by Recombining Demonstrations

    Full text link
    Learning-based manipulation policies from image inputs often show weak task transfer capabilities. In contrast, visual servoing methods allow efficient task transfer in high-precision scenarios while requiring only a few demonstrations. In this work, we present a framework that formulates the visual servoing task as graph traversal. Our method not only extends the robustness of visual servoing, but also enables multitask capability based on a few task-specific demonstrations. We construct demonstration graphs by splitting existing demonstrations and recombining them. In order to traverse the demonstration graph in the inference case, we utilize a similarity function that helps select the best demonstration for a specific task. This enables us to compute the shortest path through the graph. Ultimately, we show that recombining demonstrations leads to higher task-respective success. We present extensive simulation and real-world experimental results that demonstrate the efficacy of our approach.Comment: http://compservo.cs.uni-freiburg.d

    RSG: Fast Learning Adaptive Skills for Quadruped Robots by Skill Graph

    Full text link
    Developing robotic intelligent systems that can adapt quickly to unseen wild situations is one of the critical challenges in pursuing autonomous robotics. Although some impressive progress has been made in walking stability and skill learning in the field of legged robots, their ability to fast adaptation is still inferior to that of animals in nature. Animals are born with massive skills needed to survive, and can quickly acquire new ones, by composing fundamental skills with limited experience. Inspired by this, we propose a novel framework, named Robot Skill Graph (RSG) for organizing massive fundamental skills of robots and dexterously reusing them for fast adaptation. Bearing a structure similar to the Knowledge Graph (KG), RSG is composed of massive dynamic behavioral skills instead of static knowledge in KG and enables discovering implicit relations that exist in be-tween of learning context and acquired skills of robots, serving as a starting point for understanding subtle patterns existing in robots' skill learning. Extensive experimental results demonstrate that RSG can provide rational skill inference upon new tasks and environments and enable quadruped robots to adapt to new scenarios and learn new skills rapidly

    Self-Play and Self-Describe: Policy Adaptation with Vision-Language Foundation Models

    Full text link
    Recent progress on vision-language foundation models have brought significant advancement to building general-purpose robots. By using the pre-trained models to encode the scene and instructions as inputs for decision making, the instruction-conditioned policy can generalize across different objects and tasks. While this is encouraging, the policy still fails in most cases given an unseen task or environment. To adapt the policy to unseen tasks and environments, we explore a new paradigm on leveraging the pre-trained foundation models with Self-PLAY and Self-Describe (SPLAYD). When deploying the trained policy to a new task or a new environment, we first let the policy self-play with randomly generated instructions to record the demonstrations. While the execution could be wrong, we can use the pre-trained foundation models to accurately self-describe (i.e., re-label or classify) the demonstrations. This automatically provides new pairs of demonstration-instruction data for policy fine-tuning. We evaluate our method on a broad range of experiments with the focus on generalization on unseen objects, unseen tasks, unseen environments, and sim-to-real transfer. We show SPLAYD improves baselines by a large margin in all cases. Our project page is available at https://geyuying.github.io/SPLAYD/Comment: Project page: https://geyuying.github.io/SPLAYD

    Modular Deep Learning

    Full text link
    Transfer learning has recently become the dominant paradigm of machine learning. Pre-trained models fine-tuned for downstream tasks achieve better performance with fewer labelled examples. Nonetheless, it remains unclear how to develop models that specialise towards multiple tasks without incurring negative interference and that generalise systematically to non-identically distributed tasks. Modular deep learning has emerged as a promising solution to these challenges. In this framework, units of computation are often implemented as autonomous parameter-efficient modules. Information is conditionally routed to a subset of modules and subsequently aggregated. These properties enable positive transfer and systematic generalisation by separating computation from routing and updating modules locally. We offer a survey of modular architectures, providing a unified view over several threads of research that evolved independently in the scientific literature. Moreover, we explore various additional purposes of modularity, including scaling language models, causal inference, programme induction, and planning in reinforcement learning. Finally, we report various concrete applications where modularity has been successfully deployed such as cross-lingual and cross-modal knowledge transfer. Related talks and projects to this survey, are available at https://www.modulardeeplearning.com/

    A formal methods approach to interpretability, safety and composability for reinforcement learning

    Full text link
    Robotic systems that are capable of learning from experience have recently become more common place. These systems have demonstrated success in learning difficult control tasks. However, as tasks become more complex and the number of options to reason about becomes greater, there is an increasing need to be able to specify the desired behavior in a structured and interpretable fashion, guarantee system safety, conveniently integrate task specific knowledge with more general knowledge about the world and generate new skills from learned ones without additional exploration. This thesis addresses these problems specifically in the case of reinforcement learning (RL) by using techniques from formal methods. Experience and prior knowledge shape the way humans make decisions when asked to perform complex tasks. Conversely, robots have had difficulty incorporating a rich set of prior knowledge when solving complex planning and control problems. In RL, the reward offers an avenue for incorporating prior knowledge. However, incorporating such knowledge is not always straightforward using standard reward engineering techniques. This thesis presents a formal specification language that can combine a base of general knowledge with task specifications to generate richer task descriptions. For example, to make a hotdog at the task level, one needs to grab a sausage, grill it, place the cooked sausage in a bun, apply ketchup, and serve. Prior knowledge about the context of the task, e.g., sausages can be damaged if squeezed too hard, should also be taken into account. Interpretability in RL rewards - easily understanding what the reward function represents and knowing how to improve it - is a key component in understanding the behavior of an RL agent. This property is often missing in reward engineering techniques, which makes it difficult to understand exactly what the implications of the reward function are when tasks become complex. Interpretability of the reward allows for better value alignment between human intent and system objectives, leading to a lower likelihood of reward hacking by the system. The formal specification language presented in this work has the added benefit of being easily interpretable for its similarity with natural language. Safe RL - guaranteeing undesirable behaviors do not occur (i.e. collisions with obstacles), is a critical concern when learning and deployment of robotic systems happen in the real world. Safety for these systems not only presents legal challenges to their wide adoption, but also raises risks to hardware and users. By using techniques from formal methods and control theory, we provide two main components to ensure safety in the RL agent behaviors. First, the formal specification language allows for explicit definition of undesirable behaviors (e.g. always avoid collisions). Second, control barrier functions (CBF) are used to enforce these safety constraints. Composability of learned skills - the ability to compose new skills from a library of learned ones can significantly enhance a robot's capabilities by making efficient use of past experience. Modern RL systems focus mainly on mastery (maximizing the given reward) and less on generalization (transfer from one task domain to another). In this thesis, we will also exploit the logical and graphical representations of the task specification and develop techniques for skill composition
    corecore