6 research outputs found

    Enabling Visual Action Planning for Object Manipulation through Latent Space Roadmap

    Full text link
    We present a framework for visual action planning of complex manipulation tasks with high-dimensional state spaces, focusing on manipulation of deformable objects. We propose a Latent Space Roadmap (LSR) for task planning, a graph-based structure capturing globally the system dynamics in a low-dimensional latent space. Our framework consists of three parts: (1) a Mapping Module (MM) that maps observations, given in the form of images, into a structured latent space extracting the respective states, that generates observations from the latent states, (2) the LSR which builds and connects clusters containing similar states in order to find the latent plans between start and goal states extracted by MM, and (3) the Action Proposal Module that complements the latent plan found by the LSR with the corresponding actions. We present a thorough investigation of our framework on two simulated box stacking tasks and a folding task executed on a real robot

    Development of an active vision system for robot inspection of complex objects

    Get PDF
    Dissertação de mestrado integrado em Engenharia Mecânica (área de especialização em Sistemas Mecatrónicos)The dissertation presented here is in the scope of the IntVis4Insp project between University of Minho and the company Neadvance. It focuses on the development of a 3D hand tracking system that must be capable of extracting the hand position and orientation to prepare a manipulator for automatic inspection of leather pieces. This work starts with a literature review about the two main methods for collecting the necessary data to perform 3D hand tracking. These divide into glove-based methods and vision-based methods. The first ones work with some kind of support mounted on the hand that holds all the necessary sensors to measure the desired parameters. While the second ones recur to one or more cameras to capture the hands and through computer vision algorithms track their position and configuration. The selected method for this work was the vision-based method Openpose. For each recorded image, this application can locate 21 hand keypoints on each hand that together form a skeleton of the hands. This application is used in the tracking system developed throughout this dissertation. Its information is used in a more complete pipeline where the location of those hand keypoints is crucial to track the hands in videos of the demonstrated movements. These videos were recorded with an RGB-D camera, the Microsoft Kinect, which provides a depth value for every RGB pixel recorded. With the depth information and the 2D location of the hand keypoints in the images, it was possible to obtain the 3D world coordinates of these points considering the pinhole camera model. To define the hand, position a point is selected among the 21 for each hand, but for the hand orientation, it was necessary to develop an auxiliary method called “Iterative Pose Estimation Method” (ITP), which estimates the complete 3D pose of the hands. This method recurs only to the 2D locations of every hand keypoint, and the complete 3D world coordinates of the wrists to estimate the right 3D world coordinates of all the remaining points on the hand. This solution solves the problems related to hand occlusions that a prone to happen due to the use of only one camera to record the inspection videos. Once the world location of all the points in the hands is accurately estimated, their orientation can be defined by selecting three points forming a plane.A dissertação aqui apresentada insere-se no âmbito do projeto IntVis4Insp entre a Universidade do Minho e a empresa Neadavance, e foca-se no desenvolvimento de um sistema para extração da posição e orientação das mãos no espaço para posterior auxílio na manipulação automática de peças de couro, com recurso a manipuladores robóticos. O trabalho inicia-se com uma revisão literária sobre os dois principais métodos existentes para efetuar a recolha de dados necessária à monitorização da posição e orientação das mãos ao longo do tempo. Estes dividem-se em métodos baseados em luvas ou visão. No caso dos primeiros, estes recorrem normalmente a algum tipo de suporte montado na mão (ex.: luva em tecido), onde estão instalados todos os sensores necessários para a medição dos parâmetros desejados. Relativamente a sistemas de visão estes recorrem a uma câmara ou conjunto delas para capturar as mãos e por via de algoritmos de visão por computador determinam a sua posição e configuração. Foi selecionado para este trabalho um algoritmo de visão por computador denominado por Openpose. Este é capaz de, em cada imagem gravada e para cada mão, localizar 21 pontos pertencentes ao seu esqueleto. Esta aplicação é inserida no sistema de monitorização desenvolvido, sendo utilizada a sua informação numa arquitetura mais completa onde é efetuada a extração da localização dos pontos chave de cada mão nos vídeos de demonstração dos movimentos de inspeção. A gravação destes vídeos é efetuada com uma câmara RGB-D, a Microsoft Kinect, que fornece um valor de profundidade para cada pixel RGB gravado. Com os dados de profundidade e a localização dos pontos chave nas imagens foi possível obter as coordenadas 3D no mundo destes pontos considerando o modelo pinhole para a câmara. No caso da posição da mão é selecionado um ponto de entre os 21 para a definir ao longo do tempo, no entanto, para o cálculo da orientação foi desenvolvido um método auxiliar para estimação da pose tridimensional da mão denominado por “Iterative Pose Estimation Method” (ITP). Este método recorre aos dados 2D do Openpose e às coordenadas 3D do pulso de cada mão para efetuar a correta estimação das coordenadas 3D dos restantes pontos da mão. Isto permite essencialmente resolver problemas com oclusões da mão, muito frequentes com o uso de uma só câmara na gravação dos vídeos. Uma vez estimada corretamente a posição 3D no mundo dos vários pontos da mão, a sua orientação pode ser definida com recurso a quaisquer três pontos que definam um plano

    Robotic system for garment perception and manipulation

    Get PDF
    Mención Internacional en el título de doctorGarments are a key element of people’s daily lives, as many domestic tasks -such as laundry-, revolve around them. Performing such tasks, generally dull and repetitive, implies devoting many hours of unpaid labor to them, that could be freed through automation. But automation of such tasks has been traditionally hard due to the deformable nature of garments, that creates additional challenges to the already existing when performing object perception and manipulation. This thesis presents a Robotic System for Garment Perception and Manipulation that intends to address these challenges. The laundry pipeline as defined in this work is composed by four independent -but sequential- tasks: hanging, unfolding, ironing and folding. The aim of this work is the automation of this pipeline through a robotic system able to work on domestic environments as a robot household companion. Laundry starts by washing the garments, that then need to be dried, frequently by hanging them. As hanging is a complex task requiring bimanipulation skills and dexterity, a simplified approach is followed in this work as a starting point, by using a deep convolutional neural network and a custom synthetic dataset to study if a robot can predict whether a garment will hang or not when dropped over a hanger, as a first step towards a more complex controller. After the garment is dry, it has to be unfolded to ease recognition of its garment category for the next steps. The presented model-less unfolding method uses only color and depth information from the garment to determine the grasp and release points of an unfolding action, that is repeated iteratively until the garment is fully spread. Before storage, wrinkles have to be removed from the garment. For that purpose, a novel ironing method is proposed, that uses a custom wrinkle descriptor to locate the most prominent wrinkles and generate a suitable ironing plan. The method does not require a precise control of the light conditions of the scene, and is able to iron using unmodified ironing tools through a force-feedback-based controller. Finally, the last step is to fold the garment to store it. One key aspect when folding is to perform the folding operation in a precise manner, as errors will accumulate when several folds are required. A neural folding controller is proposed that uses visual feedback of the current garment shape, extracted through a deep neural network trained with synthetic data, to accurately perform a fold. All the methods presented to solve each of the laundry pipeline tasks have been validated experimentally on different robotic platforms, including a full-body humanoid robot.La ropa es un elemento clave en la vida diaria de las personas, no sólo a la hora de vestir, sino debido también a que muchas de las tareas domésticas que una persona debe realizar diariamente, como hacer la colada, requieren interactuar con ellas. Estas tareas, a menudo tediosas y repetitivas, obligan a invertir una gran cantidad de horas de trabajo no remunerado en su realización, las cuales podrían reducirse a través de su automatización. Sin embargo, automatizar dichas tareas ha sido tradicionalmente un reto, debido a la naturaleza deformable de las prendas, que supone una dificultad añadida a las ya existentes al llevar a cabo percepción y manipulación de objetos a través de robots. Esta tesis presenta un sistema robótico orientado a la percepción y manipulación de prendas, que pretende resolver dichos retos. La colada es una tarea doméstica compuesta de varias subtareas que se llevan a cabo de manera secuencial. En este trabajo, se definen dichas subtareas como: tender, desdoblar, planchar y doblar. El objetivo de este trabajo es automatizar estas tareas a través de un sistema robótico capaz de trabajar en entornos domésticos, convirtiéndose en un asistente robótico doméstico. La colada comienza lavando las prendas, las cuales han de ser posteriormente secadas, generalmente tendiéndolas al aire libre, para poder realizar el resto de subtareas con ellas. Tender la ropa es una tarea compleja, que requiere de bimanipulación y una gran destreza al manipular la prenda. Por ello, en este trabajo se ha optado por abordar una versión simplicada de la tarea de tendido, como punto de partida para llevar a cabo investigaciones más avanzadas en el futuro. A través de una red neuronal convolucional profunda y un conjunto de datos de entrenamiento sintéticos, se ha llevado a cabo un estudio sobre la capacidad de predecir el resultado de dejar caer una prenda sobre un tendedero por parte de un robot. Este estudio, que sirve como primer paso hacia un controlador más avanzado, ha resultado en un modelo capaz de predecir si la prenda se quedará tendida o no a partir de una imagen de profundidad de la misma en la posición en la que se dejará caer. Una vez las prendas están secas, y para facilitar su reconocimiento por parte del robot de cara a realizar las siguientes tareas, la prenda debe ser desdoblada. El método propuesto en este trabajo para realizar el desdoble no requiere de un modelo previo de la prenda, y utiliza únicamente información de profundidad y color, obtenida mediante un sensor RGB-D, para calcular los puntos de agarre y soltado de una acción de desdoble. Este proceso es iterativo, y se repite hasta que la prenda se encuentra totalmente desdoblada. Antes de almacenar la prenda, se deben eliminar las posibles arrugas que hayan surgido en el proceso de lavado y secado. Para ello, se propone un nuevo algoritmo de planchado, que utiliza un descriptor de arrugas desarrollado en este trabajo para localizar las arrugas más prominentes y generar un plan de planchado acorde a las condiciones de la prenda. A diferencia de otros métodos existentes, este método puede aplicarse en un entorno doméstico, ya que no requiere de un contol preciso de las condiciones de iluminación. Además, es capaz de usar las mismas herramientas de planchado que usaría una persona sin necesidad de realizar modificaciones a las mismas, a través de un controlador que usa realimentación de fuerza para aplicar una presión constante durante el planchado. El último paso al hacer la colada es doblar la prenda para almacenarla. Un aspecto importante al doblar prendas es ejecutar cada uno de los dobleces necesarios con precisión, ya que cada error o desfase cometido en un doblez se acumula cuando la secuencia de doblado está formada por varios dobleces consecutivos. Para llevar a cabo estos dobleces con la precisión requerida, se propone un controlador basado en una red neuronal, que utiliza realimentación visual de la forma de la prenda durante cada operación de doblado. Esta realimentación es obtenida a través de una red neuronal profunda entrenada con un conjunto de entrenamiento sintético, que permite estimar la forma en 3D de la parte a doblar a través de una imagen monocular de la misma. Todos los métodos descritos en esta tesis han sido validados experimentalmente con éxito en diversas plataformas robóticas, incluyendo un robot humanoide.Programa de Doctorado en Ingeniería Eléctrica, Electrónica y Automática por la Universidad Carlos III de MadridPresidente: Abderrahmane Kheddar.- Secretario: Ramón Ignacio Barber Castaño.- Vocal: Karinne Ramírez-Amar

    Efficient Motion Planning for Deformable Objects with High Degrees of Freedom

    Get PDF
    Many robotics and graphics applications need to be able to plan motions by interacting with complex environmental objects, including solids, sands, plants, and fluids. A key aspect of these deformable objects is that they have high-DOF, which implies that they can move or change shapes in many independent ways subject to physics-based constraints. In these applications, users also impose high-level goals on the movements of high-DOF objects, and planning algorithms need to model their motions and determine the optimal control actions to satisfy the high-level goals. In this thesis, we propose several planning algorithms for high-DOF objects. Our algorithms can improve the scalability considerably and can plan motions for different types of objects, including elastically deformable objects, free-surface flows, and Eulerian fluids. We show that the salient deformations of elastically deformable objects lie in a low-dimensional nonlinear space, i.e., the RS space. By embedding the configuration space in the RS subspace, our optimization-based motion planning algorithm can achieve over two orders of magnitude speedup over prior optimization-based formulations. For free surface flows such as liquids, we utilize features of the planning problems and machine learning techniques to identify low-dimensional latent spaces to accelerate the motion planning computation. For Eulerian fluids without free surfaces, we present a scalable planning algorithm based on novel numerical techniques. We show that the numerical discretization scheme exhibits strong regularity, which allows us to accelerate optimization-based motion planning algorithms using a hierarchical data structure and we can achieve 3-10 times speedup over gradient-based optimization techniques. Finally, for high-DOF objects with many frictional contacts with the environment, we present a contact dynamic model that can handle contacts without expensive combinatorial optimization. We illustrate the benefits of our high-DOF planning algorithms for three applications. First, we can plan contact-rich motion trajectories for general elastically deformable robots. Second, we can achieve real-time performance in terms of planning the motion of a robot arm to transfer the liquids between containers. Finally, our method enables a more intuitive user interface. We allow animation editors to modify animations using an offline motion planner to generate controlled fluid animations.Doctor of Philosoph
    corecore