6 research outputs found
Enabling Visual Action Planning for Object Manipulation through Latent Space Roadmap
We present a framework for visual action planning of complex manipulation
tasks with high-dimensional state spaces, focusing on manipulation of
deformable objects. We propose a Latent Space Roadmap (LSR) for task planning,
a graph-based structure capturing globally the system dynamics in a
low-dimensional latent space. Our framework consists of three parts: (1) a
Mapping Module (MM) that maps observations, given in the form of images, into a
structured latent space extracting the respective states, that generates
observations from the latent states, (2) the LSR which builds and connects
clusters containing similar states in order to find the latent plans between
start and goal states extracted by MM, and (3) the Action Proposal Module that
complements the latent plan found by the LSR with the corresponding actions. We
present a thorough investigation of our framework on two simulated box stacking
tasks and a folding task executed on a real robot
Development of an active vision system for robot inspection of complex objects
Dissertação de mestrado integrado em Engenharia Mecânica (área de especialização em Sistemas Mecatrónicos)The dissertation presented here is in the scope of the IntVis4Insp project between University of Minho
and the company Neadvance. It focuses on the development of a 3D hand tracking system that must be
capable of extracting the hand position and orientation to prepare a manipulator for automatic inspection
of leather pieces.
This work starts with a literature review about the two main methods for collecting the necessary data to
perform 3D hand tracking. These divide into glove-based methods and vision-based methods. The first
ones work with some kind of support mounted on the hand that holds all the necessary sensors to
measure the desired parameters. While the second ones recur to one or more cameras to capture the
hands and through computer vision algorithms track their position and configuration. The selected
method for this work was the vision-based method Openpose. For each recorded image, this application
can locate 21 hand keypoints on each hand that together form a skeleton of the hands.
This application is used in the tracking system developed throughout this dissertation. Its information is
used in a more complete pipeline where the location of those hand keypoints is crucial to track the hands
in videos of the demonstrated movements. These videos were recorded with an RGB-D camera, the
Microsoft Kinect, which provides a depth value for every RGB pixel recorded. With the depth information
and the 2D location of the hand keypoints in the images, it was possible to obtain the 3D world coordinates
of these points considering the pinhole camera model.
To define the hand, position a point is selected among the 21 for each hand, but for the hand orientation,
it was necessary to develop an auxiliary method called “Iterative Pose Estimation Method” (ITP), which
estimates the complete 3D pose of the hands. This method recurs only to the 2D locations of every hand
keypoint, and the complete 3D world coordinates of the wrists to estimate the right 3D world coordinates
of all the remaining points on the hand. This solution solves the problems related to hand occlusions that
a prone to happen due to the use of only one camera to record the inspection videos. Once the world
location of all the points in the hands is accurately estimated, their orientation can be defined by selecting
three points forming a plane.A dissertação aqui apresentada insere-se no âmbito do projeto IntVis4Insp entre a Universidade do Minho
e a empresa Neadavance, e foca-se no desenvolvimento de um sistema para extração da posição e
orientação das mãos no espaço para posterior auxílio na manipulação automática de peças de couro,
com recurso a manipuladores robóticos.
O trabalho inicia-se com uma revisão literária sobre os dois principais métodos existentes para efetuar a
recolha de dados necessária à monitorização da posição e orientação das mãos ao longo do tempo.
Estes dividem-se em métodos baseados em luvas ou visão. No caso dos primeiros, estes recorrem
normalmente a algum tipo de suporte montado na mão (ex.: luva em tecido), onde estão instalados todos
os sensores necessários para a medição dos parâmetros desejados. Relativamente a sistemas de visão
estes recorrem a uma câmara ou conjunto delas para capturar as mãos e por via de algoritmos de visão
por computador determinam a sua posição e configuração. Foi selecionado para este trabalho um
algoritmo de visão por computador denominado por Openpose. Este é capaz de, em cada imagem
gravada e para cada mão, localizar 21 pontos pertencentes ao seu esqueleto.
Esta aplicação é inserida no sistema de monitorização desenvolvido, sendo utilizada a sua informação
numa arquitetura mais completa onde é efetuada a extração da localização dos pontos chave de cada
mão nos vídeos de demonstração dos movimentos de inspeção. A gravação destes vídeos é efetuada
com uma câmara RGB-D, a Microsoft Kinect, que fornece um valor de profundidade para cada pixel RGB
gravado. Com os dados de profundidade e a localização dos pontos chave nas imagens foi possível obter
as coordenadas 3D no mundo destes pontos considerando o modelo pinhole para a câmara. No caso da
posição da mão é selecionado um ponto de entre os 21 para a definir ao longo do tempo, no entanto,
para o cálculo da orientação foi desenvolvido um método auxiliar para estimação da pose tridimensional
da mão denominado por “Iterative Pose Estimation Method” (ITP). Este método recorre aos dados 2D
do Openpose e às coordenadas 3D do pulso de cada mão para efetuar a correta estimação das
coordenadas 3D dos restantes pontos da mão. Isto permite essencialmente resolver problemas com
oclusões da mão, muito frequentes com o uso de uma só câmara na gravação dos vídeos. Uma vez
estimada corretamente a posição 3D no mundo dos vários pontos da mão, a sua orientação pode ser
definida com recurso a quaisquer três pontos que definam um plano
Robotic system for garment perception and manipulation
Mención Internacional en el título de doctorGarments are a key element of people’s daily lives, as many
domestic tasks -such as laundry-, revolve around them. Performing
such tasks, generally dull and repetitive, implies devoting
many hours of unpaid labor to them, that could be freed
through automation. But automation of such tasks has been traditionally
hard due to the deformable nature of garments, that
creates additional challenges to the already existing when performing
object perception and manipulation. This thesis presents
a Robotic System for Garment Perception and Manipulation
that intends to address these challenges.
The laundry pipeline as defined in this work is composed
by four independent -but sequential- tasks: hanging, unfolding,
ironing and folding. The aim of this work is the automation of
this pipeline through a robotic system able to work on domestic
environments as a robot household companion.
Laundry starts by washing the garments, that then need to
be dried, frequently by hanging them. As hanging is a complex
task requiring bimanipulation skills and dexterity, a simplified
approach is followed in this work as a starting point, by using
a deep convolutional neural network and a custom synthetic
dataset to study if a robot can predict whether a garment will
hang or not when dropped over a hanger, as a first step towards
a more complex controller.
After the garment is dry, it has to be unfolded to ease recognition
of its garment category for the next steps. The presented
model-less unfolding method uses only color and depth information
from the garment to determine the grasp and release
points of an unfolding action, that is repeated iteratively until
the garment is fully spread.
Before storage, wrinkles have to be removed from the garment.
For that purpose, a novel ironing method is proposed,
that uses a custom wrinkle descriptor to locate the most prominent
wrinkles and generate a suitable ironing plan. The method
does not require a precise control of the light conditions of
the scene, and is able to iron using unmodified ironing tools
through a force-feedback-based controller.
Finally, the last step is to fold the garment to store it. One
key aspect when folding is to perform the folding operation in a precise manner, as errors will accumulate when several
folds are required. A neural folding controller is proposed that
uses visual feedback of the current garment shape, extracted
through a deep neural network trained with synthetic data, to
accurately perform a fold.
All the methods presented to solve each of the laundry pipeline
tasks have been validated experimentally on different robotic
platforms, including a full-body humanoid robot.La ropa es un elemento clave en la vida diaria de las personas,
no sólo a la hora de vestir, sino debido también a que muchas
de las tareas domésticas que una persona debe realizar diariamente,
como hacer la colada, requieren interactuar con ellas.
Estas tareas, a menudo tediosas y repetitivas, obligan a invertir
una gran cantidad de horas de trabajo no remunerado en
su realización, las cuales podrían reducirse a través de su automatización.
Sin embargo, automatizar dichas tareas ha sido
tradicionalmente un reto, debido a la naturaleza deformable de
las prendas, que supone una dificultad añadida a las ya existentes
al llevar a cabo percepción y manipulación de objetos a
través de robots. Esta tesis presenta un sistema robótico orientado
a la percepción y manipulación de prendas, que pretende
resolver dichos retos.
La colada es una tarea doméstica compuesta de varias subtareas
que se llevan a cabo de manera secuencial. En este trabajo,
se definen dichas subtareas como: tender, desdoblar, planchar
y doblar. El objetivo de este trabajo es automatizar estas tareas
a través de un sistema robótico capaz de trabajar en entornos
domésticos, convirtiéndose en un asistente robótico doméstico.
La colada comienza lavando las prendas, las cuales han de
ser posteriormente secadas, generalmente tendiéndolas al aire
libre, para poder realizar el resto de subtareas con ellas. Tender
la ropa es una tarea compleja, que requiere de bimanipulación
y una gran destreza al manipular la prenda. Por ello, en este
trabajo se ha optado por abordar una versión simplicada de
la tarea de tendido, como punto de partida para llevar a cabo
investigaciones más avanzadas en el futuro. A través de una red
neuronal convolucional profunda y un conjunto de datos de
entrenamiento sintéticos, se ha llevado a cabo un estudio sobre
la capacidad de predecir el resultado de dejar caer una prenda
sobre un tendedero por parte de un robot. Este estudio, que
sirve como primer paso hacia un controlador más avanzado,
ha resultado en un modelo capaz de predecir si la prenda se
quedará tendida o no a partir de una imagen de profundidad
de la misma en la posición en la que se dejará caer.
Una vez las prendas están secas, y para facilitar su reconocimiento
por parte del robot de cara a realizar las siguientes tareas, la prenda debe ser desdoblada. El método propuesto en
este trabajo para realizar el desdoble no requiere de un modelo
previo de la prenda, y utiliza únicamente información de profundidad
y color, obtenida mediante un sensor RGB-D, para
calcular los puntos de agarre y soltado de una acción de desdoble.
Este proceso es iterativo, y se repite hasta que la prenda se
encuentra totalmente desdoblada.
Antes de almacenar la prenda, se deben eliminar las posibles
arrugas que hayan surgido en el proceso de lavado y secado.
Para ello, se propone un nuevo algoritmo de planchado, que
utiliza un descriptor de arrugas desarrollado en este trabajo para
localizar las arrugas más prominentes y generar un plan de
planchado acorde a las condiciones de la prenda. A diferencia
de otros métodos existentes, este método puede aplicarse en un
entorno doméstico, ya que no requiere de un contol preciso de
las condiciones de iluminación. Además, es capaz de usar las
mismas herramientas de planchado que usaría una persona sin
necesidad de realizar modificaciones a las mismas, a través de
un controlador que usa realimentación de fuerza para aplicar
una presión constante durante el planchado.
El último paso al hacer la colada es doblar la prenda para
almacenarla. Un aspecto importante al doblar prendas es ejecutar
cada uno de los dobleces necesarios con precisión, ya que
cada error o desfase cometido en un doblez se acumula cuando
la secuencia de doblado está formada por varios dobleces
consecutivos. Para llevar a cabo estos dobleces con la precisión
requerida, se propone un controlador basado en una red neuronal,
que utiliza realimentación visual de la forma de la prenda
durante cada operación de doblado. Esta realimentación es obtenida
a través de una red neuronal profunda entrenada con
un conjunto de entrenamiento sintético, que permite estimar
la forma en 3D de la parte a doblar a través de una imagen
monocular de la misma.
Todos los métodos descritos en esta tesis han sido validados
experimentalmente con éxito en diversas plataformas robóticas,
incluyendo un robot humanoide.Programa de Doctorado en Ingeniería Eléctrica, Electrónica y Automática por la Universidad Carlos III de MadridPresidente: Abderrahmane Kheddar.- Secretario: Ramón Ignacio Barber Castaño.- Vocal: Karinne Ramírez-Amar
Efficient Motion Planning for Deformable Objects with High Degrees of Freedom
Many robotics and graphics applications need to be able to plan motions by interacting with complex environmental objects, including solids, sands, plants, and fluids. A key aspect of these deformable objects is that they have high-DOF, which implies that they can move or change shapes in many independent ways subject to physics-based constraints. In these applications, users also impose high-level goals on the movements of high-DOF objects, and planning algorithms need to model their motions and determine the optimal control actions to satisfy the high-level goals. In this thesis, we propose several planning algorithms for high-DOF objects. Our algorithms can improve the scalability considerably and can plan motions for different types of objects, including elastically deformable objects, free-surface flows, and Eulerian fluids. We show that the salient deformations of elastically deformable objects lie in a low-dimensional nonlinear space, i.e., the RS space. By embedding the configuration space in the RS subspace, our optimization-based motion planning algorithm can achieve over two orders of magnitude speedup over prior optimization-based formulations. For free surface flows such as liquids, we utilize features of the planning problems and machine learning techniques to identify low-dimensional latent spaces to accelerate the motion planning computation. For Eulerian fluids without free surfaces, we present a scalable planning algorithm based on novel numerical techniques. We show that the numerical discretization scheme exhibits strong regularity, which allows us to accelerate optimization-based motion planning algorithms using a hierarchical data structure and we can achieve 3-10 times speedup over gradient-based optimization techniques. Finally, for high-DOF objects with many frictional contacts with the environment, we present a contact dynamic model that can handle contacts without expensive combinatorial optimization. We illustrate the benefits of our high-DOF planning algorithms for three applications. First, we can plan contact-rich motion trajectories for general elastically deformable robots. Second, we can achieve real-time performance in terms of planning the motion of a robot arm to transfer the liquids between containers. Finally, our method enables a more intuitive user interface. We allow animation editors to modify animations using an offline motion planner to generate controlled fluid animations.Doctor of Philosoph