1,137 research outputs found
Deep Vision for Prosthetic Grasp
Ph. D. ThesisThe loss of the hand can limit the natural ability of individuals in grasping and
manipulating objects and affect their quality of life. Prosthetic hands can aid the
users in overcoming these limitations and regaining their ability. Despite considerable
technical advances, the control of commercial hand prostheses is still limited
to few degrees of freedom. Furthermore, switching a prosthetic hand into a desired
grip mode can be tiring. Therefore, the performance of hand prostheses should
improve greatly.
The main aim of this thesis is to improve the functionality, performance and flexibility
of current hand prostheses by augmentation of current commercial hand prosthetics
with a vision module.
By offering the prosthesis the capacity to see objects, appropriate grip modes can
be determined autonomously and quickly. Several deep learning-based approaches
were designed in this thesis to realise such a vision-reinforced prosthetic system.
Importantly, the user, interacting with this learning structure, may act as a supervisor
to accept or correct the suggested grasp. Amputee participants evaluated the
designed system and provided feedback.
The following objectives for prosthetic hands were met:
1. Chapter 3: Design, implementation and real-time testing of a semi-autonomous
vision-reinforced prosthetic control structure, empowered with a baseline convolutional
neural network deep learning structure.
2. Chapter 4: Development of advanced deep learning structure to simultaneously
detect and estimate grasp maps for unknown objects, in presence of
ambiguity.
3. Chapter 5: Design and development of several deep learning set-ups for concurrent
depth and grasp map as well as human grasp type prediction.
Publicly available datasets, consisting of common graspable objects, namely Amsterdam
library of object images (ALOI) and Cornell grasp library were used within
this thesis. Moreover, to have access to real data, a small dataset of household
objects was gathered for the experiments, that is Newcastle Grasp Library.EPSRC, School of Engineering Newcastle University
StructDiffusion: Object-Centric Diffusion for Semantic Rearrangement of Novel Objects
Robots operating in human environments must be able to rearrange objects into
semantically-meaningful configurations, even if these objects are previously
unseen. In this work, we focus on the problem of building physically-valid
structures without step-by-step instructions. We propose StructDiffusion, which
combines a diffusion model and an object-centric transformer to construct
structures out of a single RGB-D image based on high-level language goals, such
as "set the table." Our method shows how diffusion models can be used for
complex multi-step 3D planning tasks. StructDiffusion improves success rate on
assembling physically-valid structures out of unseen objects by on average 16%
over an existing multi-modal transformer model, while allowing us to use one
multi-task model to produce a wider range of different structures. We show
experiments on held-out objects in both simulation and on real-world
rearrangement tasks. For videos and additional results, check out our website:
http://weiyuliu.com/StructDiffusion/
Development of an active vision system for robot inspection of complex objects
Dissertação de mestrado integrado em Engenharia Mecânica (área de especialização em Sistemas Mecatrónicos)The dissertation presented here is in the scope of the IntVis4Insp project between University of Minho
and the company Neadvance. It focuses on the development of a 3D hand tracking system that must be
capable of extracting the hand position and orientation to prepare a manipulator for automatic inspection
of leather pieces.
This work starts with a literature review about the two main methods for collecting the necessary data to
perform 3D hand tracking. These divide into glove-based methods and vision-based methods. The first
ones work with some kind of support mounted on the hand that holds all the necessary sensors to
measure the desired parameters. While the second ones recur to one or more cameras to capture the
hands and through computer vision algorithms track their position and configuration. The selected
method for this work was the vision-based method Openpose. For each recorded image, this application
can locate 21 hand keypoints on each hand that together form a skeleton of the hands.
This application is used in the tracking system developed throughout this dissertation. Its information is
used in a more complete pipeline where the location of those hand keypoints is crucial to track the hands
in videos of the demonstrated movements. These videos were recorded with an RGB-D camera, the
Microsoft Kinect, which provides a depth value for every RGB pixel recorded. With the depth information
and the 2D location of the hand keypoints in the images, it was possible to obtain the 3D world coordinates
of these points considering the pinhole camera model.
To define the hand, position a point is selected among the 21 for each hand, but for the hand orientation,
it was necessary to develop an auxiliary method called “Iterative Pose Estimation Method” (ITP), which
estimates the complete 3D pose of the hands. This method recurs only to the 2D locations of every hand
keypoint, and the complete 3D world coordinates of the wrists to estimate the right 3D world coordinates
of all the remaining points on the hand. This solution solves the problems related to hand occlusions that
a prone to happen due to the use of only one camera to record the inspection videos. Once the world
location of all the points in the hands is accurately estimated, their orientation can be defined by selecting
three points forming a plane.A dissertação aqui apresentada insere-se no âmbito do projeto IntVis4Insp entre a Universidade do Minho
e a empresa Neadavance, e foca-se no desenvolvimento de um sistema para extração da posição e
orientação das mãos no espaço para posterior auxílio na manipulação automática de peças de couro,
com recurso a manipuladores robóticos.
O trabalho inicia-se com uma revisão literária sobre os dois principais métodos existentes para efetuar a
recolha de dados necessária à monitorização da posição e orientação das mãos ao longo do tempo.
Estes dividem-se em métodos baseados em luvas ou visão. No caso dos primeiros, estes recorrem
normalmente a algum tipo de suporte montado na mão (ex.: luva em tecido), onde estão instalados todos
os sensores necessários para a medição dos parâmetros desejados. Relativamente a sistemas de visão
estes recorrem a uma câmara ou conjunto delas para capturar as mãos e por via de algoritmos de visão
por computador determinam a sua posição e configuração. Foi selecionado para este trabalho um
algoritmo de visão por computador denominado por Openpose. Este é capaz de, em cada imagem
gravada e para cada mão, localizar 21 pontos pertencentes ao seu esqueleto.
Esta aplicação é inserida no sistema de monitorização desenvolvido, sendo utilizada a sua informação
numa arquitetura mais completa onde é efetuada a extração da localização dos pontos chave de cada
mão nos vídeos de demonstração dos movimentos de inspeção. A gravação destes vídeos é efetuada
com uma câmara RGB-D, a Microsoft Kinect, que fornece um valor de profundidade para cada pixel RGB
gravado. Com os dados de profundidade e a localização dos pontos chave nas imagens foi possível obter
as coordenadas 3D no mundo destes pontos considerando o modelo pinhole para a câmara. No caso da
posição da mão é selecionado um ponto de entre os 21 para a definir ao longo do tempo, no entanto,
para o cálculo da orientação foi desenvolvido um método auxiliar para estimação da pose tridimensional
da mão denominado por “Iterative Pose Estimation Method” (ITP). Este método recorre aos dados 2D
do Openpose e às coordenadas 3D do pulso de cada mão para efetuar a correta estimação das
coordenadas 3D dos restantes pontos da mão. Isto permite essencialmente resolver problemas com
oclusões da mão, muito frequentes com o uso de uma só câmara na gravação dos vídeos. Uma vez
estimada corretamente a posição 3D no mundo dos vários pontos da mão, a sua orientação pode ser
definida com recurso a quaisquer três pontos que definam um plano
INSTA-BEEER: Explicit Error Estimation and Refinement for Fast and Accurate Unseen Object Instance Segmentation
Efficient and accurate segmentation of unseen objects is crucial for robotic
manipulation. However, it remains challenging due to over- or
under-segmentation. Although existing refinement methods can enhance the
segmentation quality, they fix only minor boundary errors or are not
sufficiently fast. In this work, we propose INSTAnce Boundary Explicit Error
Estimation and Refinement (INSTA-BEEER), a novel refinement model that allows
for adding and deleting instances and sharpening boundaries. Leveraging an
error-estimation-then-refinement scheme, the model first estimates the
pixel-wise boundary explicit errors: true positive, true negative, false
positive, and false negative pixels of the instance boundary in the initial
segmentation. It then refines the initial segmentation using these error
estimates as guidance. Experiments show that the proposed model significantly
enhances segmentation, achieving state-of-the-art performance. Furthermore,
with a fast runtime (less than 0.1 s), the model consistently improves
performance across various initial segmentation methods, making it highly
suitable for practical robotic applications.Comment: 8 pages, 5 figure
- …