944 research outputs found

    Learning garment manipulation policies toward robot-assisted dressing.

    Get PDF
    Assistive robots have the potential to support people with disabilities in a variety of activities of daily living, such as dressing. People who have completely lost their upper limb movement functionality may benefit from robot-assisted dressing, which involves complex deformable garment manipulation. Here, we report a dressing pipeline intended for these people and experimentally validate it on a medical training manikin. The pipeline is composed of the robot grasping a hospital gown hung on a rail, fully unfolding the gown, navigating around a bed, and lifting up the user's arms in sequence to finally dress the user. To automate this pipeline, we address two fundamental challenges: first, learning manipulation policies to bring the garment from an uncertain state into a configuration that facilitates robust dressing; second, transferring the deformable object manipulation policies learned in simulation to real world to leverage cost-effective data generation. We tackle the first challenge by proposing an active pre-grasp manipulation approach that learns to isolate the garment grasping area before grasping. The approach combines prehensile and nonprehensile actions and thus alleviates grasping-only behavioral uncertainties. For the second challenge, we bridge the sim-to-real gap of deformable object policy transfer by approximating the simulator to real-world garment physics. A contrastive neural network is introduced to compare pairs of real and simulated garment observations, measure their physical similarity, and account for simulator parameters inaccuracies. The proposed method enables a dual-arm robot to put back-opening hospital gowns onto a medical manikin with a success rate of more than 90%

    Active recognition and pose estimation of rigid and deformable objects in 3D space

    Get PDF
    Object recognition and pose estimation is a fundamental problem in computer vision and of utmost importance in robotic applications. Object recognition refers to the problem of recognizing certain object instances, or categorizing objects into specific classes. Pose estimation deals with estimating the exact position of the object in 3D space, usually expressed in Euler angles. There are generally two types of objects that require special care when designing solutions to the aforementioned problems: rigid and deformable. Dealing with deformable objects has been a much harder problem, and usually solutions that apply to rigid objects, fail when used for deformable objects due to the inherent assumptions made during the design. In this thesis we deal with object categorization, instance recognition and pose estimation of both rigid and deformable objects. In particular, we are interested in a special type of deformable objects, clothes. We tackle the problem of autonomously recognizing and unfolding articles of clothing using a dual manipulator. This problem consists of grasping an article from a random point, recognizing it and then bringing it into an unfolded state by a dual arm robot. We propose a data-driven method for clothes recognition from depth images using Random Decision Forests. We also propose a method for unfolding an article of clothing after estimating and grasping two key-points, using Hough Forests. Both methods are implemented into a POMDP framework allowing the robot to interact optimally with the garments, taking into account uncertainty in the recognition and point estimation process. This active recognition and unfolding makes our system very robust to noisy observations. Our methods were tested on regular-sized clothes using a dual-arm manipulator. Our systems perform better in both accuracy and speed compared to state-of-the-art approaches. In order to take advantage of the robotic manipulator and increase the accuracy of our system, we developed a novel approach to address generic active vision problems, called Active Random Forests. While state of the art focuses on best viewing parameters selection based on single view classifiers, we propose a multi-view classifier where the decision mechanism of optimally changing viewing parameters is inherent to the classification process. This has many advantages: a) the classifier exploits the entire set of captured images and does not simply aggregate probabilistically per view hypotheses; b) actions are based on learnt disambiguating features from all views and are optimally selected using the powerful voting scheme of Random Forests and c) the classifier can take into account the costs of actions. The proposed framework was applied to the same task of autonomously unfolding clothes by a robot, addressing the problem of best viewpoint selection in classification, grasp point and pose estimation of garments. We show great performance improvement compared to state of the art methods and our previous POMDP formulation. Moving from deformable to rigid objects while keeping our interest to domestic robotic applications, we focus on object instance recognition and 3D pose estimation of household objects. We are particularly interested in realistic scenes that are very crowded and objects can be perceived under severe occlusions. Single shot-based 6D pose estimators with manually designed features are still unable to tackle such difficult scenarios for a variety of objects, motivating the research towards unsupervised feature learning and next-best-view estimation. We present a complete framework for both single shot-based 6D object pose estimation and next-best-view prediction based on Hough Forests, the state of the art object pose estimator that performs classification and regression jointly. Rather than using manually designed features we propose an unsupervised feature learnt from depth-invariant patches using a Sparse Autoencoder. Furthermore, taking advantage of the clustering performed in the leaf nodes of Hough Forests, we learn to estimate the reduction of uncertainty in other views, formulating the problem of selecting the next-best-view. To further improve 6D object pose estimation, we propose an improved joint registration and hypotheses verification module as a final refinement step to reject false detections. We provide two additional challenging datasets inspired from realistic scenarios to extensively evaluate the state of the art and our framework. One is related to domestic environments and the other depicts a bin-picking scenario mostly found in industrial settings. We show that our framework significantly outperforms state of the art both on public and on our datasets. Unsupervised feature learning, although efficient, might produce sub-optimal features for our particular tast. Therefore in our last work, we leverage the power of Convolutional Neural Networks to tackled the problem of estimating the pose of rigid objects by an end-to-end deep regression network. To improve the moderate performance of the standard regression objective function, we introduce the Siamese Regression Network. For a given image pair, we enforce a similarity measure between the representation of the sample images in the feature and pose space respectively, that is shown to boost regression performance. Furthermore, we argue that our pose-guided feature learning using our Siamese Regression Network generates more discriminative features that outperform the state of the art. Last, our feature learning formulation provides the ability of learning features that can perform under severe occlusions, demonstrating high performance on our novel hand-object dataset. Concluding, this work is a research on the area of object detection and pose estimation in 3D space, on a variety of object types. Furthermore we investigate how accuracy can be further improved by applying active vision techniques to optimally move the camera view to minimize the detection error.Open Acces

    Clothes Grasping and Unfolding Based on RGB-D Semantic Segmentation

    Get PDF

    Autonomous clothes manipulation using a hierarchical vision architecture

    Get PDF
    This paper presents a novel robot vision architecture for perceiving generic 3-D clothes configurations. Our architecture is hierarchically structured, starting from low-level curvature features to mid-level geometric shapes and topology descriptions, and finally, high-level semantic surface descriptions. We demonstrate our robot vision architecture in a customized dual-arm industrial robot with our inhouse developed stereo vision system, carrying out autonomous grasping and dual-arm flattening. The experimental results show the effectiveness of the proposed dual-arm flattening using the stereo vision system compared with the single-arm flattening using the widely cited Kinect-like sensor as the baseline. In addition, the proposed grasping approach achieves satisfactory performance when grasping various kind of garments, verifying the capability of the proposed visual perception architecture to be adapted to more than one clothing manipulation tasks

    Development of a learning from demonstration environment using ZED 2i and HTC Vive Pro

    Get PDF
    Being able to teach complex capabilities, such as folding garments, to a bi-manual robot is a very challenging task, which is often tackled using learning from demonstration datasets. The few garment folding datasets available nowadays to the robotics research community are either gathered from human demonstrations or generated through simulation. The former have the huge problem of perceiving human action and transferring it to the dynamic control of the robot, while the latter requires coding human motion into the simulator in open loop, resulting in far-from-realistic movements. In this thesis, a novel virtual reality (VR) framework is proposed, based on Unity’s 3D platform and the use of HTC Vive Pro system, ZED mini, and ZED 2i cameras, and Leap motion’s hand-tracking module. The framework is capable of detecting and tracking objects, animals, and human bodies in a 3D environment. Moreover, the framework is also capable of simulating very realistic garments while allowing users to interact with them, in real-time, either through handheld controllers or the user’s real hands. By doing so, and thanks to the immersive experience, the framework gets rid of the gap between the human and robot perception-action loop, while simplifying data capture and resulting in more realistic samples. Finally, using the developed framework, a novel garment manipulation dataset will be recorded, containing samples with data and videos of nineteen different types of manipulation which aim to help tasks related to robot learning by demonstrationObjectius de Desenvolupament Sostenible::9 - Indústria, Innovació i Infraestructur

    Visual-tactile learning of garment unfolding for robot-assisted dressing

    Get PDF
    Assistive robots have the potential to support disabled and elderly people in daily dressing activities. An intermediate stage of dressing is to manipulate the garment from a crumpled initial state to an unfolded configuration that facilitates robust dressing. Applying quasi-static grasping actions with vision feedback on garment unfolding usually suffers from occluded grasping points. In this work, we propose a dynamic manipulation strategy: tracing the garment edge until the hidden corner is revealed. We introduce a model-based approach, where a deep visual-tactile predictive model iteratively learns to perform servoing from raw sensor data. The predictive model is formalized as Conditional Variational Autoencoder with contrastive optimization, which jointly learns underlying visual-tactile latent representations, a latent garment dynamics model, and future predictions of garment states. Two cost functions are explored: the visual cost, defined by garment corner positions, guarantees the gripper to move towards the corner, while the tactile cost, defined by garment edge poses, prevents the garment from falling from the gripper. The experimental results demonstrate the improvement of our contrastive visual-tactile model predictive control over single sensing modality and baseline model learning techniques. The proposed method enables a robot to unfold back-opening hospital gowns and perform upper-body dressing

    Robotic system for garment perception and manipulation

    Get PDF
    Mención Internacional en el título de doctorGarments are a key element of people’s daily lives, as many domestic tasks -such as laundry-, revolve around them. Performing such tasks, generally dull and repetitive, implies devoting many hours of unpaid labor to them, that could be freed through automation. But automation of such tasks has been traditionally hard due to the deformable nature of garments, that creates additional challenges to the already existing when performing object perception and manipulation. This thesis presents a Robotic System for Garment Perception and Manipulation that intends to address these challenges. The laundry pipeline as defined in this work is composed by four independent -but sequential- tasks: hanging, unfolding, ironing and folding. The aim of this work is the automation of this pipeline through a robotic system able to work on domestic environments as a robot household companion. Laundry starts by washing the garments, that then need to be dried, frequently by hanging them. As hanging is a complex task requiring bimanipulation skills and dexterity, a simplified approach is followed in this work as a starting point, by using a deep convolutional neural network and a custom synthetic dataset to study if a robot can predict whether a garment will hang or not when dropped over a hanger, as a first step towards a more complex controller. After the garment is dry, it has to be unfolded to ease recognition of its garment category for the next steps. The presented model-less unfolding method uses only color and depth information from the garment to determine the grasp and release points of an unfolding action, that is repeated iteratively until the garment is fully spread. Before storage, wrinkles have to be removed from the garment. For that purpose, a novel ironing method is proposed, that uses a custom wrinkle descriptor to locate the most prominent wrinkles and generate a suitable ironing plan. The method does not require a precise control of the light conditions of the scene, and is able to iron using unmodified ironing tools through a force-feedback-based controller. Finally, the last step is to fold the garment to store it. One key aspect when folding is to perform the folding operation in a precise manner, as errors will accumulate when several folds are required. A neural folding controller is proposed that uses visual feedback of the current garment shape, extracted through a deep neural network trained with synthetic data, to accurately perform a fold. All the methods presented to solve each of the laundry pipeline tasks have been validated experimentally on different robotic platforms, including a full-body humanoid robot.La ropa es un elemento clave en la vida diaria de las personas, no sólo a la hora de vestir, sino debido también a que muchas de las tareas domésticas que una persona debe realizar diariamente, como hacer la colada, requieren interactuar con ellas. Estas tareas, a menudo tediosas y repetitivas, obligan a invertir una gran cantidad de horas de trabajo no remunerado en su realización, las cuales podrían reducirse a través de su automatización. Sin embargo, automatizar dichas tareas ha sido tradicionalmente un reto, debido a la naturaleza deformable de las prendas, que supone una dificultad añadida a las ya existentes al llevar a cabo percepción y manipulación de objetos a través de robots. Esta tesis presenta un sistema robótico orientado a la percepción y manipulación de prendas, que pretende resolver dichos retos. La colada es una tarea doméstica compuesta de varias subtareas que se llevan a cabo de manera secuencial. En este trabajo, se definen dichas subtareas como: tender, desdoblar, planchar y doblar. El objetivo de este trabajo es automatizar estas tareas a través de un sistema robótico capaz de trabajar en entornos domésticos, convirtiéndose en un asistente robótico doméstico. La colada comienza lavando las prendas, las cuales han de ser posteriormente secadas, generalmente tendiéndolas al aire libre, para poder realizar el resto de subtareas con ellas. Tender la ropa es una tarea compleja, que requiere de bimanipulación y una gran destreza al manipular la prenda. Por ello, en este trabajo se ha optado por abordar una versión simplicada de la tarea de tendido, como punto de partida para llevar a cabo investigaciones más avanzadas en el futuro. A través de una red neuronal convolucional profunda y un conjunto de datos de entrenamiento sintéticos, se ha llevado a cabo un estudio sobre la capacidad de predecir el resultado de dejar caer una prenda sobre un tendedero por parte de un robot. Este estudio, que sirve como primer paso hacia un controlador más avanzado, ha resultado en un modelo capaz de predecir si la prenda se quedará tendida o no a partir de una imagen de profundidad de la misma en la posición en la que se dejará caer. Una vez las prendas están secas, y para facilitar su reconocimiento por parte del robot de cara a realizar las siguientes tareas, la prenda debe ser desdoblada. El método propuesto en este trabajo para realizar el desdoble no requiere de un modelo previo de la prenda, y utiliza únicamente información de profundidad y color, obtenida mediante un sensor RGB-D, para calcular los puntos de agarre y soltado de una acción de desdoble. Este proceso es iterativo, y se repite hasta que la prenda se encuentra totalmente desdoblada. Antes de almacenar la prenda, se deben eliminar las posibles arrugas que hayan surgido en el proceso de lavado y secado. Para ello, se propone un nuevo algoritmo de planchado, que utiliza un descriptor de arrugas desarrollado en este trabajo para localizar las arrugas más prominentes y generar un plan de planchado acorde a las condiciones de la prenda. A diferencia de otros métodos existentes, este método puede aplicarse en un entorno doméstico, ya que no requiere de un contol preciso de las condiciones de iluminación. Además, es capaz de usar las mismas herramientas de planchado que usaría una persona sin necesidad de realizar modificaciones a las mismas, a través de un controlador que usa realimentación de fuerza para aplicar una presión constante durante el planchado. El último paso al hacer la colada es doblar la prenda para almacenarla. Un aspecto importante al doblar prendas es ejecutar cada uno de los dobleces necesarios con precisión, ya que cada error o desfase cometido en un doblez se acumula cuando la secuencia de doblado está formada por varios dobleces consecutivos. Para llevar a cabo estos dobleces con la precisión requerida, se propone un controlador basado en una red neuronal, que utiliza realimentación visual de la forma de la prenda durante cada operación de doblado. Esta realimentación es obtenida a través de una red neuronal profunda entrenada con un conjunto de entrenamiento sintético, que permite estimar la forma en 3D de la parte a doblar a través de una imagen monocular de la misma. Todos los métodos descritos en esta tesis han sido validados experimentalmente con éxito en diversas plataformas robóticas, incluyendo un robot humanoide.Programa de Doctorado en Ingeniería Eléctrica, Electrónica y Automática por la Universidad Carlos III de MadridPresidente: Abderrahmane Kheddar.- Secretario: Ramón Ignacio Barber Castaño.- Vocal: Karinne Ramírez-Amar

    Perception and manipulation for robot-assisted dressing

    Get PDF
    Assistive robots have the potential to provide tremendous support for disabled and elderly people in their daily dressing activities. This thesis presents a series of perception and manipulation algorithms for robot-assisted dressing, including: garment perception and grasping prior to robot-assisted dressing, real-time user posture tracking during robot-assisted dressing for (simulated) impaired users with limited upper-body movement capability, and finally a pipeline for robot-assisted dressing for (simulated) paralyzed users who have lost the ability to move their limbs. First, the thesis explores learning suitable grasping points on a garment prior to robot-assisted dressing. Robots should be endowed with the ability to autonomously recognize the garment state, grasp and hand the garment to the user and subsequently complete the dressing process. This is addressed by introducing a supervised deep neural network to locate grasping points. To reduce the amount of real data required, which is costly to collect, the power of simulation is leveraged to produce large amounts of labeled data. Unexpected user movements should be taken into account during dressing when planning robot dressing trajectories. Tracking such user movements with vision sensors is challenging due to severe visual occlusions created by the robot and clothes. A probabilistic real-time tracking method is proposed using Bayesian networks in latent spaces, which fuses multi-modal sensor information. The latent spaces are created before dressing by modeling the user movements, taking the user's movement limitations and preferences into account. The tracking method is then combined with hierarchical multi-task control to minimize the force between the user and the robot. The proposed method enables the Baxter robot to provide personalized dressing assistance for users with (simulated) upper-body impairments. Finally, a pipeline for dressing (simulated) paralyzed patients using a mobile dual-armed robot is presented. The robot grasps a hospital gown naturally hung on a rail, and moves around the bed to finish the upper-body dressing of a hospital training manikin. To further improve simulations for garment grasping, this thesis proposes to update more realistic physical properties values for the simulated garment. This is achieved by measuring physical similarity in the latent space using contrastive loss, which maps physically similar examples to nearby points.Open Acces

    A virtual reality framework for fast dataset creation applied to cloth manipulation with automatic semantic labelling

    Get PDF
    © 2023 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.Teaching complex manipulation skills, such as folding garments, to a bi-manual robot is a very challenging task, which is often tackled through learning from demonstration. The few datasets of garment-folding demonstrations available nowadays to the robotics research community have been either gathered from human demonstrations or generated through simulation. The former have the great difficulty of perceiving both cloth state and human action as well as transferring them to the dynamic control of the robot, while the latter require coding human motion into the simulator in open loop, i.e., without incorporating the visual feedback naturally used by people, resulting in far-from-realistic movements. In this article, we present an accurate dataset of human cloth folding demonstrations. The dataset is collected through our novel virtual reality (VR) framework, based on Unity’s 3D platform and the use of an HTC Vive Pro system. The framework is capable of simulating realistic garments while allowing users to interact with them in real time through handheld controllers. By doing so, and thanks to the immersive experience, our framework permits exploiting human visual feedback in the demonstrations while at the same time getting rid of the difficulties of capturing the state of cloth, thus simplifying data acquisition and resulting in more realistic demonstrations. We create and make public a dataset of cloth manipulation sequences, whose cloth states are semantically labeled in an automatic way by using a novel low-dimensional cloth representation that yields a very good separation between different cloth configurations.The research leading to these results receives funding from the European Research Council (ERC) from the European Union Horizon 2020 Programme under grant agreement no. 741930 (CLOTHILDE: CLOTH manIpulation Learning from DEmonstrations) and project SoftEnable (HORIZONCL4-2021-DIGITAL-EMERGING-01-101070600). Authors also received funding from project CHLOE-GRAPH (PID2020-118649RB-I00) funded by MCIN/ AEI /10.13039/501100011033 and COHERENT (PCI2020-120718-2) funded by MCIN/ AEI /10.13039/501100011033 and cofunded by the ”European Union NextGenerationEU/PRTR”.Peer ReviewedPostprint (author's final draft
    corecore