944 research outputs found
Learning garment manipulation policies toward robot-assisted dressing.
Assistive robots have the potential to support people with disabilities in a variety of activities of daily living, such as dressing. People who have completely lost their upper limb movement functionality may benefit from robot-assisted dressing, which involves complex deformable garment manipulation. Here, we report a dressing pipeline intended for these people and experimentally validate it on a medical training manikin. The pipeline is composed of the robot grasping a hospital gown hung on a rail, fully unfolding the gown, navigating around a bed, and lifting up the user's arms in sequence to finally dress the user. To automate this pipeline, we address two fundamental challenges: first, learning manipulation policies to bring the garment from an uncertain state into a configuration that facilitates robust dressing; second, transferring the deformable object manipulation policies learned in simulation to real world to leverage cost-effective data generation. We tackle the first challenge by proposing an active pre-grasp manipulation approach that learns to isolate the garment grasping area before grasping. The approach combines prehensile and nonprehensile actions and thus alleviates grasping-only behavioral uncertainties. For the second challenge, we bridge the sim-to-real gap of deformable object policy transfer by approximating the simulator to real-world garment physics. A contrastive neural network is introduced to compare pairs of real and simulated garment observations, measure their physical similarity, and account for simulator parameters inaccuracies. The proposed method enables a dual-arm robot to put back-opening hospital gowns onto a medical manikin with a success rate of more than 90%
Active recognition and pose estimation of rigid and deformable objects in 3D space
Object recognition and pose estimation is a fundamental problem in computer vision and of utmost importance in robotic applications. Object recognition refers to the problem of recognizing certain object instances, or categorizing objects into specific classes. Pose estimation deals with estimating the exact position of the object in 3D space, usually expressed in Euler angles. There are generally two types of objects that require special care when designing solutions to the aforementioned problems: rigid and deformable. Dealing with deformable objects has been a much harder problem, and usually solutions that apply to rigid objects, fail when used for deformable objects due to the inherent assumptions made during the design.
In this thesis we deal with object categorization, instance recognition and pose estimation of both rigid and deformable objects. In particular, we are interested in a special type of deformable objects, clothes. We tackle the problem of autonomously recognizing and unfolding articles of clothing using a dual manipulator. This problem consists of grasping an article from a random point, recognizing it and then bringing it into an unfolded state by a dual arm robot. We propose a data-driven method for clothes recognition from depth images using Random Decision Forests. We also propose a method for unfolding an article of clothing after estimating and grasping two key-points, using Hough Forests. Both methods are implemented into a POMDP framework allowing the robot to interact optimally with the garments, taking into account uncertainty in the recognition and point estimation process. This active recognition and unfolding makes our system very robust to noisy observations. Our methods were tested on regular-sized clothes using a dual-arm manipulator. Our systems perform better in both accuracy and speed compared to state-of-the-art approaches.
In order to take advantage of the robotic manipulator and increase the accuracy of our system, we developed a novel approach to address generic active vision problems, called Active Random Forests. While state of the art focuses on best viewing parameters selection based on single view classifiers, we propose a multi-view classifier where the decision mechanism of optimally changing viewing parameters is inherent to the classification process. This has many advantages: a) the classifier exploits the entire set of captured images and does not simply aggregate probabilistically per view hypotheses; b) actions are based on learnt disambiguating features from all views and are optimally selected using the powerful voting scheme of Random Forests and c) the classifier can take into account the costs of actions. The proposed framework was applied to the same task of autonomously unfolding clothes by a robot, addressing the problem of best viewpoint selection in classification, grasp point and pose estimation of garments. We show great performance improvement compared to state of the art methods and our previous POMDP formulation.
Moving from deformable to rigid objects while keeping our interest to domestic robotic applications, we focus on object instance recognition and 3D pose estimation of household objects. We are particularly interested in realistic scenes that are very crowded and objects can be perceived under severe occlusions. Single shot-based 6D pose estimators with manually designed features are still unable to tackle such difficult scenarios for a variety of objects, motivating the research towards unsupervised feature learning and next-best-view estimation. We present a complete framework for both single shot-based 6D object pose estimation and next-best-view prediction based on Hough Forests, the state of the art object pose estimator that performs classification and regression jointly. Rather than using manually designed features we propose an unsupervised feature learnt from depth-invariant patches using a Sparse Autoencoder. Furthermore, taking advantage of the clustering performed in the leaf nodes of Hough Forests, we learn to estimate the reduction of uncertainty in other views, formulating the problem of selecting the next-best-view. To further improve 6D object pose estimation, we propose an improved joint registration and hypotheses verification module as a final refinement step to reject false detections. We provide two additional challenging datasets inspired from realistic scenarios to extensively evaluate the state of the art and our framework. One is related to domestic environments and the other depicts a bin-picking scenario mostly found in industrial settings. We show that our framework significantly outperforms state of the art both on public and on our datasets.
Unsupervised feature learning, although efficient, might produce sub-optimal features for our particular tast. Therefore in our last work, we leverage the power of Convolutional Neural Networks to tackled the problem of estimating the pose of rigid objects by an end-to-end deep regression network. To improve the moderate performance of the standard regression objective function, we introduce the Siamese Regression Network. For a given image pair, we enforce a similarity measure between the representation of the sample images in the feature and pose space respectively, that is shown to boost regression performance. Furthermore, we argue that our pose-guided feature learning using our Siamese Regression Network generates more discriminative features that outperform the state of the art. Last, our feature learning formulation provides the ability of learning features that can perform under severe occlusions, demonstrating high performance on our novel hand-object dataset.
Concluding, this work is a research on the area of object detection and pose estimation in 3D space, on a variety of object types. Furthermore we investigate how accuracy can be further improved by applying active vision techniques to optimally move the camera view to minimize the detection error.Open Acces
Autonomous clothes manipulation using a hierarchical vision architecture
This paper presents a novel robot vision architecture for perceiving generic 3-D clothes configurations. Our architecture is hierarchically structured, starting from low-level curvature features to mid-level geometric shapes and topology descriptions, and finally, high-level semantic surface descriptions. We demonstrate our robot vision architecture in a customized dual-arm industrial robot with our inhouse developed stereo vision system, carrying out autonomous grasping and dual-arm flattening. The experimental results show the effectiveness of the proposed dual-arm flattening using the stereo vision system compared with the single-arm flattening using the widely cited Kinect-like sensor as the baseline. In addition, the proposed grasping approach achieves satisfactory performance when grasping various kind of garments, verifying the capability of the proposed visual perception architecture to be adapted to more than one clothing manipulation tasks
Development of a learning from demonstration environment using ZED 2i and HTC Vive Pro
Being able to teach complex capabilities, such as folding garments, to a bi-manual robot is a very challenging task, which is often tackled using learning from demonstration datasets. The few garment folding datasets available nowadays to the robotics research community are either gathered from human demonstrations or generated through simulation. The former have the huge problem of perceiving human action and transferring it to the dynamic control of the robot, while the latter requires coding human motion into the simulator in open loop, resulting in far-from-realistic movements. In this thesis, a novel virtual reality (VR) framework is proposed, based on Unity’s 3D platform and the use of HTC Vive Pro system, ZED mini, and ZED 2i cameras, and Leap motion’s hand-tracking module. The framework is capable of detecting and tracking objects, animals, and human bodies in a 3D environment. Moreover, the framework is also capable of simulating very realistic garments while allowing users to interact with them, in real-time, either through handheld controllers or the user’s real hands. By doing so, and thanks to the immersive experience, the framework gets rid of the gap between the human and robot perception-action loop, while simplifying data capture and resulting in more realistic samples. Finally, using the developed framework, a novel garment manipulation dataset will be recorded, containing samples with data and videos of nineteen different types of manipulation which aim to help tasks related to robot learning by demonstrationObjectius de Desenvolupament Sostenible::9 - Indústria, Innovació i Infraestructur
Visual-tactile learning of garment unfolding for robot-assisted dressing
Assistive robots have the potential to support disabled and elderly people in daily dressing activities. An intermediate stage of dressing is to manipulate the garment from a crumpled initial state to an unfolded configuration that facilitates robust dressing. Applying quasi-static grasping actions with vision feedback on garment unfolding usually suffers from occluded grasping points. In this work, we propose a dynamic manipulation strategy: tracing the garment edge until the hidden corner is revealed. We introduce a model-based approach, where a deep visual-tactile predictive model iteratively learns to perform servoing from raw sensor data. The predictive model is formalized as Conditional Variational Autoencoder with contrastive optimization, which jointly learns underlying visual-tactile latent representations, a latent garment dynamics model, and future predictions of garment states. Two cost functions are explored: the visual cost, defined by garment corner positions, guarantees the gripper to move towards the corner, while the tactile cost, defined by garment edge poses, prevents the garment from falling from the gripper. The experimental results demonstrate the improvement of our contrastive visual-tactile model predictive control over single sensing modality and baseline model learning techniques. The proposed method enables a robot to unfold back-opening hospital gowns and perform upper-body dressing
Robotic system for garment perception and manipulation
Mención Internacional en el título de doctorGarments are a key element of people’s daily lives, as many
domestic tasks -such as laundry-, revolve around them. Performing
such tasks, generally dull and repetitive, implies devoting
many hours of unpaid labor to them, that could be freed
through automation. But automation of such tasks has been traditionally
hard due to the deformable nature of garments, that
creates additional challenges to the already existing when performing
object perception and manipulation. This thesis presents
a Robotic System for Garment Perception and Manipulation
that intends to address these challenges.
The laundry pipeline as defined in this work is composed
by four independent -but sequential- tasks: hanging, unfolding,
ironing and folding. The aim of this work is the automation of
this pipeline through a robotic system able to work on domestic
environments as a robot household companion.
Laundry starts by washing the garments, that then need to
be dried, frequently by hanging them. As hanging is a complex
task requiring bimanipulation skills and dexterity, a simplified
approach is followed in this work as a starting point, by using
a deep convolutional neural network and a custom synthetic
dataset to study if a robot can predict whether a garment will
hang or not when dropped over a hanger, as a first step towards
a more complex controller.
After the garment is dry, it has to be unfolded to ease recognition
of its garment category for the next steps. The presented
model-less unfolding method uses only color and depth information
from the garment to determine the grasp and release
points of an unfolding action, that is repeated iteratively until
the garment is fully spread.
Before storage, wrinkles have to be removed from the garment.
For that purpose, a novel ironing method is proposed,
that uses a custom wrinkle descriptor to locate the most prominent
wrinkles and generate a suitable ironing plan. The method
does not require a precise control of the light conditions of
the scene, and is able to iron using unmodified ironing tools
through a force-feedback-based controller.
Finally, the last step is to fold the garment to store it. One
key aspect when folding is to perform the folding operation in a precise manner, as errors will accumulate when several
folds are required. A neural folding controller is proposed that
uses visual feedback of the current garment shape, extracted
through a deep neural network trained with synthetic data, to
accurately perform a fold.
All the methods presented to solve each of the laundry pipeline
tasks have been validated experimentally on different robotic
platforms, including a full-body humanoid robot.La ropa es un elemento clave en la vida diaria de las personas,
no sólo a la hora de vestir, sino debido también a que muchas
de las tareas domésticas que una persona debe realizar diariamente,
como hacer la colada, requieren interactuar con ellas.
Estas tareas, a menudo tediosas y repetitivas, obligan a invertir
una gran cantidad de horas de trabajo no remunerado en
su realización, las cuales podrían reducirse a través de su automatización.
Sin embargo, automatizar dichas tareas ha sido
tradicionalmente un reto, debido a la naturaleza deformable de
las prendas, que supone una dificultad añadida a las ya existentes
al llevar a cabo percepción y manipulación de objetos a
través de robots. Esta tesis presenta un sistema robótico orientado
a la percepción y manipulación de prendas, que pretende
resolver dichos retos.
La colada es una tarea doméstica compuesta de varias subtareas
que se llevan a cabo de manera secuencial. En este trabajo,
se definen dichas subtareas como: tender, desdoblar, planchar
y doblar. El objetivo de este trabajo es automatizar estas tareas
a través de un sistema robótico capaz de trabajar en entornos
domésticos, convirtiéndose en un asistente robótico doméstico.
La colada comienza lavando las prendas, las cuales han de
ser posteriormente secadas, generalmente tendiéndolas al aire
libre, para poder realizar el resto de subtareas con ellas. Tender
la ropa es una tarea compleja, que requiere de bimanipulación
y una gran destreza al manipular la prenda. Por ello, en este
trabajo se ha optado por abordar una versión simplicada de
la tarea de tendido, como punto de partida para llevar a cabo
investigaciones más avanzadas en el futuro. A través de una red
neuronal convolucional profunda y un conjunto de datos de
entrenamiento sintéticos, se ha llevado a cabo un estudio sobre
la capacidad de predecir el resultado de dejar caer una prenda
sobre un tendedero por parte de un robot. Este estudio, que
sirve como primer paso hacia un controlador más avanzado,
ha resultado en un modelo capaz de predecir si la prenda se
quedará tendida o no a partir de una imagen de profundidad
de la misma en la posición en la que se dejará caer.
Una vez las prendas están secas, y para facilitar su reconocimiento
por parte del robot de cara a realizar las siguientes tareas, la prenda debe ser desdoblada. El método propuesto en
este trabajo para realizar el desdoble no requiere de un modelo
previo de la prenda, y utiliza únicamente información de profundidad
y color, obtenida mediante un sensor RGB-D, para
calcular los puntos de agarre y soltado de una acción de desdoble.
Este proceso es iterativo, y se repite hasta que la prenda se
encuentra totalmente desdoblada.
Antes de almacenar la prenda, se deben eliminar las posibles
arrugas que hayan surgido en el proceso de lavado y secado.
Para ello, se propone un nuevo algoritmo de planchado, que
utiliza un descriptor de arrugas desarrollado en este trabajo para
localizar las arrugas más prominentes y generar un plan de
planchado acorde a las condiciones de la prenda. A diferencia
de otros métodos existentes, este método puede aplicarse en un
entorno doméstico, ya que no requiere de un contol preciso de
las condiciones de iluminación. Además, es capaz de usar las
mismas herramientas de planchado que usaría una persona sin
necesidad de realizar modificaciones a las mismas, a través de
un controlador que usa realimentación de fuerza para aplicar
una presión constante durante el planchado.
El último paso al hacer la colada es doblar la prenda para
almacenarla. Un aspecto importante al doblar prendas es ejecutar
cada uno de los dobleces necesarios con precisión, ya que
cada error o desfase cometido en un doblez se acumula cuando
la secuencia de doblado está formada por varios dobleces
consecutivos. Para llevar a cabo estos dobleces con la precisión
requerida, se propone un controlador basado en una red neuronal,
que utiliza realimentación visual de la forma de la prenda
durante cada operación de doblado. Esta realimentación es obtenida
a través de una red neuronal profunda entrenada con
un conjunto de entrenamiento sintético, que permite estimar
la forma en 3D de la parte a doblar a través de una imagen
monocular de la misma.
Todos los métodos descritos en esta tesis han sido validados
experimentalmente con éxito en diversas plataformas robóticas,
incluyendo un robot humanoide.Programa de Doctorado en Ingeniería Eléctrica, Electrónica y Automática por la Universidad Carlos III de MadridPresidente: Abderrahmane Kheddar.- Secretario: Ramón Ignacio Barber Castaño.- Vocal: Karinne Ramírez-Amar
Perception and manipulation for robot-assisted dressing
Assistive robots have the potential to provide tremendous support for disabled and elderly people in their daily dressing activities. This thesis presents a series of perception and manipulation algorithms for robot-assisted dressing, including: garment perception and grasping prior to robot-assisted dressing, real-time user posture tracking during robot-assisted dressing for (simulated) impaired users with limited upper-body movement capability, and finally a pipeline for robot-assisted dressing for (simulated) paralyzed users who have lost the ability to move their limbs.
First, the thesis explores learning suitable grasping points on a garment prior to robot-assisted dressing. Robots should be endowed with the ability to autonomously recognize the garment state, grasp and hand the garment to the user and subsequently complete the dressing process. This is addressed by introducing a supervised deep neural network to locate grasping points. To reduce the amount of real data required, which is costly to collect, the power of simulation is leveraged to produce large amounts of labeled data.
Unexpected user movements should be taken into account during dressing when planning robot dressing trajectories. Tracking such user movements with vision sensors is challenging due to severe visual occlusions created by the robot and clothes. A probabilistic real-time tracking method is proposed using Bayesian networks in latent spaces, which fuses multi-modal sensor information. The latent spaces are created before dressing by modeling the user movements, taking the user's movement limitations and preferences into account. The tracking method is then combined with hierarchical multi-task control to minimize the force between the user and the robot. The proposed method enables the Baxter robot to provide personalized dressing assistance for users with (simulated) upper-body impairments.
Finally, a pipeline for dressing (simulated) paralyzed patients using a mobile dual-armed robot is presented. The robot grasps a hospital gown naturally hung on a rail, and moves around the bed to finish the upper-body dressing of a hospital training manikin. To further improve simulations for garment grasping, this thesis proposes to update more realistic physical properties values for the simulated garment. This is achieved by measuring physical similarity in the latent space using contrastive loss, which maps physically similar examples to nearby points.Open Acces
A virtual reality framework for fast dataset creation applied to cloth manipulation with automatic semantic labelling
© 2023 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.Teaching complex manipulation skills, such as folding garments, to a bi-manual robot is a very challenging task, which is often tackled through learning from demonstration. The few datasets of garment-folding demonstrations available nowadays to the robotics research community have been either gathered from human demonstrations or generated through simulation. The former have the great difficulty of perceiving both cloth state and human action as well as transferring them to the dynamic control of the robot, while the latter require coding human motion into the simulator in open loop, i.e., without incorporating the visual feedback naturally used by people, resulting in far-from-realistic movements. In this article, we present an accurate dataset of human cloth folding demonstrations. The dataset is collected through our novel virtual reality (VR) framework, based on Unity’s 3D platform and the use of an HTC Vive Pro system. The framework is capable of simulating realistic garments while allowing users to interact with them in real time through handheld controllers. By doing so, and thanks to the immersive experience, our framework permits exploiting human visual feedback in the demonstrations while at the same time getting rid of the difficulties of capturing the state of cloth, thus simplifying data acquisition and resulting in more realistic demonstrations. We create and make public a dataset of cloth manipulation sequences, whose cloth states are semantically labeled in an automatic way by using a novel low-dimensional cloth representation that yields a very good separation between different cloth configurations.The research leading to these results receives funding from the European Research Council (ERC) from the European Union Horizon 2020 Programme under grant agreement no. 741930 (CLOTHILDE: CLOTH manIpulation Learning from DEmonstrations) and project SoftEnable (HORIZONCL4-2021-DIGITAL-EMERGING-01-101070600). Authors also received funding from project CHLOE-GRAPH (PID2020-118649RB-I00) funded by MCIN/ AEI /10.13039/501100011033 and COHERENT (PCI2020-120718-2) funded by MCIN/ AEI /10.13039/501100011033 and cofunded by the ”European Union NextGenerationEU/PRTR”.Peer ReviewedPostprint (author's final draft
- …