792 research outputs found

    LEGO-Net: Learning Regular Rearrangements of Objects in Rooms

    Full text link
    Humans universally dislike the task of cleaning up a messy room. If machines were to help us with this task, they must understand human criteria for regular arrangements, such as several types of symmetry, co-linearity or co-circularity, spacing uniformity in linear or circular patterns, and further inter-object relationships that relate to style and functionality. Previous approaches for this task relied on human input to explicitly specify goal state, or synthesized scenes from scratch -- but such methods do not address the rearrangement of existing messy scenes without providing a goal state. In this paper, we present LEGO-Net, a data-driven transformer-based iterative method for learning regular rearrangement of objects in messy rooms. LEGO-Net is partly inspired by diffusion models -- it starts with an initial messy state and iteratively "de-noises'' the position and orientation of objects to a regular state while reducing the distance traveled. Given randomly perturbed object positions and orientations in an existing dataset of professionally-arranged scenes, our method is trained to recover a regular re-arrangement. Results demonstrate that our method is able to reliably rearrange room scenes and outperform other methods. We additionally propose a metric for evaluating regularity in room arrangements using number-theoretic machinery.Comment: Project page: https://ivl.cs.brown.edu/projects/lego-ne

    Rearrangement on Lattices with Pick-n-Swaps: Optimality Structures and Efficient Algorithms

    Full text link
    We propose and study a class of rearrangement problems under a novel pick-n-swap prehensile manipulation model, in which a robotic manipulator, capable of carrying an item and making item swaps, is tasked to sort items stored in lattices of variable dimensions in a time-optimal manner. We systematically analyze the intrinsic optimality structure, which is fairly rich and intriguing, under different levels of item distinguishability (fully labeled, where each item has a unique label, or partially labeled, where multiple items may be of the same type) and different lattice dimensions. Focusing on the most practical setting of one and two dimensions, we develop low polynomial time cycle-following based algorithms that optimally perform rearrangements on 1D lattices under both fully- and partially-labeled settings. On the other hand, we show that rearrangement on 2D and higher dimensional lattices becomes computationally intractable to optimally solve. Despite their NP-hardness, we prove that efficient cycle-following based algorithms remain asymptotically optimal for 2D fully- and partially-labeled settings, in expectation, using the interesting fact that random permutations induce only a small number of cycles. We further improve these algorithms to provide 1.x-optimality when the number of items is small. Simulation studies corroborate the effectiveness of our algorithms.Comment: To appear in R:SS 202

    Differentiable world programs

    Full text link
    L'intelligence artificielle (IA) moderne a ouvert de nouvelles perspectives prometteuses pour la création de robots intelligents. En particulier, les architectures d'apprentissage basées sur le gradient (réseaux neuronaux profonds) ont considérablement amélioré la compréhension des scènes 3D en termes de perception, de raisonnement et d'action. Cependant, ces progrès ont affaibli l'attrait de nombreuses techniques ``classiques'' développées au cours des dernières décennies. Nous postulons qu'un mélange de méthodes ``classiques'' et ``apprises'' est la voie la plus prometteuse pour développer des modèles du monde flexibles, interprétables et exploitables : une nécessité pour les agents intelligents incorporés. La question centrale de cette thèse est : ``Quelle est la manière idéale de combiner les techniques classiques avec des architectures d'apprentissage basées sur le gradient pour une compréhension riche du monde 3D ?''. Cette vision ouvre la voie à une multitude d'applications qui ont un impact fondamental sur la façon dont les agents physiques perçoivent et interagissent avec leur environnement. Cette thèse, appelée ``programmes différentiables pour modèler l'environnement'', unifie les efforts de plusieurs domaines étroitement liés mais actuellement disjoints, notamment la robotique, la vision par ordinateur, l'infographie et l'IA. Ma première contribution---gradSLAM--- est un système de localisation et de cartographie simultanées (SLAM) dense et entièrement différentiable. En permettant le calcul du gradient à travers des composants autrement non différentiables tels que l'optimisation non linéaire par moindres carrés, le raycasting, l'odométrie visuelle et la cartographie dense, gradSLAM ouvre de nouvelles voies pour intégrer la reconstruction 3D classique et l'apprentissage profond. Ma deuxième contribution - taskography - propose une sparsification conditionnée par la tâche de grandes scènes 3D encodées sous forme de graphes de scènes 3D. Cela permet aux planificateurs classiques d'égaler (et de surpasser) les planificateurs de pointe basés sur l'apprentissage en concentrant le calcul sur les attributs de la scène pertinents pour la tâche. Ma troisième et dernière contribution---gradSim--- est un simulateur entièrement différentiable qui combine des moteurs physiques et graphiques différentiables pour permettre l'estimation des paramètres physiques et le contrôle visuomoteur, uniquement à partir de vidéos ou d'une image fixe.Modern artificial intelligence (AI) has created exciting new opportunities for building intelligent robots. In particular, gradient-based learning architectures (deep neural networks) have tremendously improved 3D scene understanding in terms of perception, reasoning, and action. However, these advancements have undermined many ``classical'' techniques developed over the last few decades. We postulate that a blend of ``classical'' and ``learned'' methods is the most promising path to developing flexible, interpretable, and actionable models of the world: a necessity for intelligent embodied agents. ``What is the ideal way to combine classical techniques with gradient-based learning architectures for a rich understanding of the 3D world?'' is the central question in this dissertation. This understanding enables a multitude of applications that fundamentally impact how embodied agents perceive and interact with their environment. This dissertation, dubbed ``differentiable world programs'', unifies efforts from multiple closely-related but currently-disjoint fields including robotics, computer vision, computer graphics, and AI. Our first contribution---gradSLAM---is a fully differentiable dense simultaneous localization and mapping (SLAM) system. By enabling gradient computation through otherwise non-differentiable components such as nonlinear least squares optimization, ray casting, visual odometry, and dense mapping, gradSLAM opens up new avenues for integrating classical 3D reconstruction and deep learning. Our second contribution---taskography---proposes a task-conditioned sparsification of large 3D scenes encoded as 3D scene graphs. This enables classical planners to match (and surpass) state-of-the-art learning-based planners by focusing computation on task-relevant scene attributes. Our third and final contribution---gradSim---is a fully differentiable simulator that composes differentiable physics and graphics engines to enable physical parameter estimation and visuomotor control, solely from videos or a still image

    DALL-E-Bot: Introducing Web-Scale Diffusion Models to Robotics

    Full text link
    We introduce the first work to explore web-scale diffusion models for robotics. DALL-E-Bot enables a robot to rearrange objects in a scene, by first inferring a text description of those objects, then generating an image representing a natural, human-like arrangement of those objects, and finally physically arranging the objects according to that image. The significance is that we achieve this zero-shot using DALL-E, without needing any further data collection or training. Encouraging real-world results with human studies show that this is a promising direction for the future of web-scale robot learning. We also propose a list of recommendations to the text-to-image community, to align further developments of these models with applications to robotics.Comment: Webpage and videos: ( https://www.robot-learning.uk/dall-e-bot ) V1: initial submission. V2: new baseline

    Manipulating Objects using Compliant, Unactuated Tails: Modeling and Planning

    Get PDF
    Ropes and rope-like objects (e.g., chains, cords, lines, whips, or lassos) are comparatively cheap, simple, and useful in daily life. For a long time, humans have used such structures for manipulation tasks in a qualitatively different ways such as pulling, fastening, attaching, tying, knotting, and whipping. Nevertheless, these structures have received little attention in robotics. Because they are unactuated, such structures are regarded as difficult to model, plan and control. In this dissertation, we are interested in a mobile robot system using a flexible rope-like structure attached to its end akin to a ‘tail’. Our goal is to investigate how mobile robots can use compliant, unactuated structures for various manipulation tasks. Robots that use a tail to manipulate objects face challenges in modeling and planning of behaviors, dynamics, and combinatorial optimization. In this dissertation, we propose several methods to deal with the difficulties of modeling and planning. In addition, we solve variants of object manipulation problems wherein multiple classes of objects are to be transported by multiple cooperative robots using ropes. Firstly, we examine motion primitives, where the primitives are designed to simplify modeling and planning issues. We explore several sets of motion primitive where each primitive contributes some aspect lacking in the others. These primitives are forward models of the system’s behavior that predict the position and orientation of the object being manipulated within the workspace. Then, to solve manipulation problems, we design a planner that seeks a sequence of motion primitives by using a sampling-based motion planning approach coupled with a particle-based representation to treat error propagation of the motions. Our proposed planner is used to optimize motion sequences based on a specified preference over a set of objectives, such as execution time, navigation cost, or collision likelihood. The solutions deal with different preferences effectively, and we analyze the complementary nature of dynamic and quasi-static motions, showing that there exist regimes where transitions among them are indeed desirable, as reflected in the plans produced. Secondly, we explore a variety of interesting primitives that result in new approaches for object manipulation problems. We examine ways two robots can join the ends of their tails so that a pair of conjoined robots can encircle objects leading to the advantage of greater towing capacity if they work cooperatively. However, individual robots possess the advantage of allowing for greater concurrency if objects are distant from one another. We solve a new manipulation problem for the scenarios of moving a collection of objects to goal locations with multiple robots that may form conjoined pairs. To maximize efficiency, the robots balance working as a tightly-knit sub-team with individual operation. We develop heuristics that give satisfactory solutions in reasonable time. The results we report include data from physical robots executing plans produced by our planner, collecting objects both by individual action and by a coupled pair operation. We expect that our research results will help to understand how a flexible compliant appendage when added to a robot can be useful for more than just agility. The proposed techniques using simple motion models for characterizing the complicated system dynamics can be used to robotics research: motion planning, minimalist manipulators, behavior-based control, and multi-robot coordination. In addition, we expect that the proposed methods can enhance the performance of various manipulation tasks, efficient search, adaptive sampling or coverage in unknown, unstructured environments

    Dynamic Handover: Throw and Catch with Bimanual Hands

    Full text link
    Humans throw and catch objects all the time. However, such a seemingly common skill introduces a lot of challenges for robots to achieve: The robots need to operate such dynamic actions at high-speed, collaborate precisely, and interact with diverse objects. In this paper, we design a system with two multi-finger hands attached to robot arms to solve this problem. We train our system using Multi-Agent Reinforcement Learning in simulation and perform Sim2Real transfer to deploy on the real robots. To overcome the Sim2Real gap, we provide multiple novel algorithm designs including learning a trajectory prediction model for the object. Such a model can help the robot catcher has a real-time estimation of where the object will be heading, and then react accordingly. We conduct our experiments with multiple objects in the real-world system, and show significant improvements over multiple baselines. Our project page is available at \url{https://binghao-huang.github.io/dynamic_handover/}.Comment: Accepted at CoRL 2023. https://binghao-huang.github.io/dynamic_handover
    • …
    corecore