146 research outputs found

    Developmental Bootstrapping of AIs

    Full text link
    Although some current AIs surpass human abilities in closed artificial worlds such as board games, their abilities in the real world are limited. They make strange mistakes and do not notice them. They cannot be instructed easily, fail to use common sense, and lack curiosity. They do not make good collaborators. Mainstream approaches for creating AIs are the traditional manually-constructed symbolic AI approach and generative and deep learning AI approaches including large language models (LLMs). These systems are not well suited for creating robust and trustworthy AIs. Although it is outside of the mainstream, the developmental bootstrapping approach has more potential. In developmental bootstrapping, AIs develop competences like human children do. They start with innate competences. They interact with the environment and learn from their interactions. They incrementally extend their innate competences with self-developed competences. They interact and learn from people and establish perceptual, cognitive, and common grounding. They acquire the competences they need through bootstrapping. However, developmental robotics has not yet produced AIs with robust adult-level competences. Projects have typically stopped at the Toddler Barrier corresponding to human infant development at about two years of age, before their speech is fluent. They also do not bridge the Reading Barrier, to skillfully and skeptically draw on the socially developed information resources that power current LLMs. The next competences in human cognitive development involve intrinsic motivation, imitation learning, imagination, coordination, and communication. This position paper lays out the logic, prospects, gaps, and challenges for extending the practice of developmental bootstrapping to acquire further competences and create robust, resilient, and human-compatible AIs.Comment: 102 pages, 29 figure

    RoboAgent: Generalization and Efficiency in Robot Manipulation via Semantic Augmentations and Action Chunking

    Full text link
    The grand aim of having a single robot that can manipulate arbitrary objects in diverse settings is at odds with the paucity of robotics datasets. Acquiring and growing such datasets is strenuous due to manual efforts, operational costs, and safety challenges. A path toward such an universal agent would require a structured framework capable of wide generalization but trained within a reasonable data budget. In this paper, we develop an efficient system (RoboAgent) for training universal agents capable of multi-task manipulation skills using (a) semantic augmentations that can rapidly multiply existing datasets and (b) action representations that can extract performant policies with small yet diverse multi-modal datasets without overfitting. In addition, reliable task conditioning and an expressive policy architecture enable our agent to exhibit a diverse repertoire of skills in novel situations specified using language commands. Using merely 7500 demonstrations, we are able to train a single agent capable of 12 unique skills, and demonstrate its generalization over 38 tasks spread across common daily activities in diverse kitchen scenes. On average, RoboAgent outperforms prior methods by over 40% in unseen situations while being more sample efficient and being amenable to capability improvements and extensions through fine-tuning. Videos at https://robopen.github.io

    From visuomotor control to latent space planning for robot manipulation

    Get PDF
    Deep visuomotor control is emerging as an active research area for robot manipulation. Recent advances in learning sensory and motor systems in an end-to-end manner have achieved remarkable performance across a range of complex tasks. Nevertheless, a few limitations restrict visuomotor control from being more widely adopted as the de facto choice when facing a manipulation task on a real robotic platform. First, imitation learning-based visuomotor control approaches tend to suffer from the inability to recover from an out-of-distribution state caused by compounding errors. Second, the lack of versatility in task definition limits skill generalisability. Finally, the training data acquisition process and domain transfer are often impractical. In this thesis, individual solutions are proposed to address each of these issues. In the first part, we find policy uncertainty to be an effective indicator of potential failure cases, in which the robot is stuck in out-of-distribution states. On this basis, we introduce a novel uncertainty-based approach to detect potential failure cases and a recovery strategy based on action-conditioned uncertainty predictions. Then, we propose to employ visual dynamics approximation to our model architecture to capture the motion of the robot arm instead of the static scene background, making it possible to learn versatile skill primitives. In the second part, taking inspiration from the recent progress in latent space planning, we propose a gradient-based optimisation method operating within the latent space of a deep generative model for motion planning. Our approach bypasses the traditional computational challenges encountered by established planning algorithms, and has the capability to specify novel constraints easily and handle multiple constraints simultaneously. Moreover, the training data comes from simple random motor-babbling of kinematically feasible robot states. Our real-world experiments further illustrate that our latent space planning approach can handle both open and closed-loop planning in challenging environments such as heavily cluttered or dynamic scenes. This leads to the first, to our knowledge, closed-loop motion planning algorithm that can incorporate novel custom constraints, and lays the foundation for more complex manipulation tasks

    Novel Bidirectional Body - Machine Interface to Control Upper Limb Prosthesis

    Get PDF
    Objective. The journey of a bionic prosthetic user is characterized by the opportunities and limitations involved in adopting a device (the prosthesis) that should enable activities of daily living (ADL). Within this context, experiencing a bionic hand as a functional (and, possibly, embodied) limb constitutes the premise for mitigating the risk of its abandonment through the continuous use of the device. To achieve such a result, different aspects must be considered for making the artificial limb an effective support for carrying out ADLs. Among them, intuitive and robust control is fundamental to improving amputees’ quality of life using upper limb prostheses. Still, as artificial proprioception is essential to perceive the prosthesis movement without constant visual attention, a good control framework may not be enough to restore practical functionality to the limb. To overcome this, bidirectional communication between the user and the prosthesis has been recently introduced and is a requirement of utmost importance in developing prosthetic hands. Indeed, closing the control loop between the user and a prosthesis by providing artificial sensory feedback is a fundamental step towards the complete restoration of the lost sensory-motor functions. Within my PhD work, I proposed the development of a more controllable and sensitive human-like hand prosthesis, i.e., the Hannes prosthetic hand, to improve its usability and effectiveness. Approach. To achieve the objectives of this thesis work, I developed a modular and scalable software and firmware architecture to control the Hannes prosthetic multi-Degree of Freedom (DoF) system and to fit all users’ needs (hand aperture, wrist rotation, and wrist flexion in different combinations). On top of this, I developed several Pattern Recognition (PR) algorithms to translate electromyographic (EMG) activity into complex movements. However, stability and repeatability were still unmet requirements in multi-DoF upper limb systems; hence, I started by investigating different strategies to produce a more robust control. To do this, EMG signals were collected from trans-radial amputees using an array of up to six sensors placed over the skin. Secondly, I developed a vibrotactile system to implement haptic feedback to restore proprioception and create a bidirectional connection between the user and the prosthesis. Similarly, I implemented an object stiffness detection to restore tactile sensation able to connect the user with the external word. This closed-loop control between EMG and vibration feedback is essential to implementing a Bidirectional Body - Machine Interface to impact amputees’ daily life strongly. For each of these three activities: (i) implementation of robust pattern recognition control algorithms, (ii) restoration of proprioception, and (iii) restoration of the feeling of the grasped object's stiffness, I performed a study where data from healthy subjects and amputees was collected, in order to demonstrate the efficacy and usability of my implementations. In each study, I evaluated both the algorithms and the subjects’ ability to use the prosthesis by means of the F1Score parameter (offline) and the Target Achievement Control test-TAC (online). With this test, I analyzed the error rate, path efficiency, and time efficiency in completing different tasks. Main results. Among the several tested methods for Pattern Recognition, the Non-Linear Logistic Regression (NLR) resulted to be the best algorithm in terms of F1Score (99%, robustness), whereas the minimum number of electrodes needed for its functioning was determined to be 4 in the conducted offline analyses. Further, I demonstrated that its low computational burden allowed its implementation and integration on a microcontroller running at a sampling frequency of 300Hz (efficiency). Finally, the online implementation allowed the subject to simultaneously control the Hannes prosthesis DoFs, in a bioinspired and human-like way. In addition, I performed further tests with the same NLR-based control by endowing it with closed-loop proprioceptive feedback. In this scenario, the results achieved during the TAC test obtained an error rate of 15% and a path efficiency of 60% in experiments where no sources of information were available (no visual and no audio feedback). Such results demonstrated an improvement in the controllability of the system with an impact on user experience. Significance. The obtained results confirmed the hypothesis of improving robustness and efficiency of a prosthetic control thanks to of the implemented closed-loop approach. The bidirectional communication between the user and the prosthesis is capable to restore the loss of sensory functionality, with promising implications on direct translation in the clinical practice

    Deep Imitation Learning for Humanoid Loco-manipulation through Human Teleoperation

    Full text link
    We tackle the problem of developing humanoid loco-manipulation skills with deep imitation learning. The difficulty of collecting task demonstrations and training policies for humanoids with a high degree of freedom presents substantial challenges. We introduce TRILL, a data-efficient framework for training humanoid loco-manipulation policies from human demonstrations. In this framework, we collect human demonstration data through an intuitive Virtual Reality (VR) interface. We employ the whole-body control formulation to transform task-space commands by human operators into the robot's joint-torque actuation while stabilizing its dynamics. By employing high-level action abstractions tailored for humanoid loco-manipulation, our method can efficiently learn complex sensorimotor skills. We demonstrate the effectiveness of TRILL in simulation and on a real-world robot for performing various loco-manipulation tasks. Videos and additional materials can be found on the project page: https://ut-austin-rpl.github.io/TRILL.Comment: Submitted to Humanoids 202

    Learning Sequential Acquisition Policies for Robot-Assisted Feeding

    Full text link
    A robot providing mealtime assistance must perform specialized maneuvers with various utensils in order to pick up and feed a range of food items. Beyond these dexterous low-level skills, an assistive robot must also plan these strategies in sequence over a long horizon to clear a plate and complete a meal. Previous methods in robot-assisted feeding introduce highly specialized primitives for food handling without a means to compose them together. Meanwhile, existing approaches to long-horizon manipulation lack the flexibility to embed highly specialized primitives into their frameworks. We propose Visual Action Planning OveR Sequences (VAPORS), a framework for long-horizon food acquisition. VAPORS learns a policy for high-level action selection by leveraging learned latent plate dynamics in simulation. To carry out sequential plans in the real world, VAPORS delegates action execution to visually parameterized primitives. We validate our approach on complex real-world acquisition trials involving noodle acquisition and bimanual scooping of jelly beans. Across 38 plates, VAPORS acquires much more efficiently than baselines, generalizes across realistic plate variations such as toppings and sauces, and qualitatively appeals to user feeding preferences in a survey conducted across 49 individuals. Code, datasets, videos, and supplementary materials can be found on our website: https://sites.google.com/view/vaporsbot

    Event-Driven Technologies for Reactive Motion Planning: Neuromorphic Stereo Vision and Robot Path Planning and Their Application on Parallel Hardware

    Get PDF
    Die Robotik wird immer mehr zu einem Schlüsselfaktor des technischen Aufschwungs. Trotz beeindruckender Fortschritte in den letzten Jahrzehnten, übertreffen Gehirne von Säugetieren in den Bereichen Sehen und Bewegungsplanung noch immer selbst die leistungsfähigsten Maschinen. Industrieroboter sind sehr schnell und präzise, aber ihre Planungsalgorithmen sind in hochdynamischen Umgebungen, wie sie für die Mensch-Roboter-Kollaboration (MRK) erforderlich sind, nicht leistungsfähig genug. Ohne schnelle und adaptive Bewegungsplanung kann sichere MRK nicht garantiert werden. Neuromorphe Technologien, einschließlich visueller Sensoren und Hardware-Chips, arbeiten asynchron und verarbeiten so raum-zeitliche Informationen sehr effizient. Insbesondere ereignisbasierte visuelle Sensoren sind konventionellen, synchronen Kameras bei vielen Anwendungen bereits überlegen. Daher haben ereignisbasierte Methoden ein großes Potenzial, schnellere und energieeffizientere Algorithmen zur Bewegungssteuerung in der MRK zu ermöglichen. In dieser Arbeit wird ein Ansatz zur flexiblen reaktiven Bewegungssteuerung eines Roboterarms vorgestellt. Dabei wird die Exterozeption durch ereignisbasiertes Stereosehen erreicht und die Pfadplanung ist in einer neuronalen Repräsentation des Konfigurationsraums implementiert. Die Multiview-3D-Rekonstruktion wird durch eine qualitative Analyse in Simulation evaluiert und auf ein Stereo-System ereignisbasierter Kameras übertragen. Zur Evaluierung der reaktiven kollisionsfreien Online-Planung wird ein Demonstrator mit einem industriellen Roboter genutzt. Dieser wird auch für eine vergleichende Studie zu sample-basierten Planern verwendet. Ergänzt wird dies durch einen Benchmark von parallelen Hardwarelösungen wozu als Testszenario Bahnplanung in der Robotik gewählt wurde. Die Ergebnisse zeigen, dass die vorgeschlagenen neuronalen Lösungen einen effektiven Weg zur Realisierung einer Robotersteuerung für dynamische Szenarien darstellen. Diese Arbeit schafft eine Grundlage für neuronale Lösungen bei adaptiven Fertigungsprozesse, auch in Zusammenarbeit mit dem Menschen, ohne Einbußen bei Geschwindigkeit und Sicherheit. Damit ebnet sie den Weg für die Integration von dem Gehirn nachempfundener Hardware und Algorithmen in die Industrierobotik und MRK

    A method for understanding and digitizing manipulation activities using programming by demonstration in robotic applications

    Get PDF
    Robots are flexible machines, where the flexibility is achieved, mainly, by the re-programming of the robotic system. To fully exploit the potential of robotic systems, an easy, fast, and intuitive programming methodology is desired. By applying such methodology, robots will be open to a wider audience of potential users (i.e. SMEs, etc.) since the need for a robotic expert in charge of programming the robot will not be needed anymore. This paper presents a Programming by Demonstration approach dealing with high-level tasks taking advantage of the ROS standard. The system identifies the different processes associated to a single-arm human manipulation activity and generates an action plan for future interpretation by the robot. The system is composed of five modules, all of them containerized and interconnected by ROS. Three of these modules are in charge of processing the manipulation data gathered by the sensors system, and converting it from the lowest level to the highest manipulation processes. In order to do this transformation, a module is used to train the system. This module generates, for each operation, an Optimized Multiorder Multivariate Markov Model, that later will be used for the operations recognition and process segmentation. Finally, the fifth module is used to interface and calibrate the system. The system was implemented and tested using a dataglove and a hand position tracker to capture the operator’s data during the manipulation. Four users and five different object types were used to train and test the system both for operations recognition and process segmentation and classification, including also the detection of the locations where the operations are performed.Peer reviewe

    Reasoning and understanding grasp affordances for robot manipulation

    Get PDF
    This doctoral research focuses on developing new methods that enable an artificial agent to grasp and manipulate objects autonomously. More specifically, we are using the concept of affordances to learn and generalise robot grasping and manipulation techniques. [75] defined affordances as the ability of an agent to perform a certain action with an object in a given environment. In robotics, affordances defines the possibility of an agent to perform actions with an object. Therefore, by understanding the relation between actions, objects and the effect of these actions, the agent understands the task at hand, providing the robot with the potential to bridge perception to action. The significance of affordances in robotics has been studied from varied perspectives, such as psychology and cognitive sciences. Many efforts have been made to pragmatically employ the concept of affordances as it provides the potential for an artificial agent to perform tasks autonomously. We start by reviewing and finding common ground amongst different strategies that use affordances for robotic tasks. We build on the identified grounds to provide guidance on including the concept of affordances as a medium to boost autonomy for an artificial agent. To this end, we outline common design choices to build an affordance relation; and their implications on the generalisation capabilities of the agent when facing previously unseen scenarios. Based on our exhaustive review, we conclude that prior research on object affordance detection is effective, however, among others, it has the following technical gaps: (i) the methods are limited to a single object ↔ affordance hypothesis, and (ii) they cannot guarantee task completion or any level of performance for the manipulation task alone nor (iii) in collaboration with other agents. In this research thesis, we propose solutions to these technical challenges. In an incremental fashion, we start by addressing the limited generalisation capabilities of, at the time state-of-the-art methods, by strengthening the perception to action connection through the construction of an Knowledge Base (KB). We then leverage the information encapsulated in the KB to design and implement a reasoning and understanding method based on statistical relational leaner (SRL) that allows us to cope with uncertainty in testing environments, and thus, improve generalisation capabilities in affordance-aware manipulation tasks. The KB in conjunctions with our SRL are the base for our designed solutions that guarantee task completion when the robot is performing a task alone as well as when in collaboration with other agents. We finally expose and discuss a range of interesting avenues that have the potential to thrive the capabilities of a robotic agent through the use of the concept of affordances for manipulation tasks. A summary of the contributions of this thesis can be found at: https://bit.ly/grasp_affordance_reasonin
    corecore