4 research outputs found

    A Survey on Hand Pose Estimation with Wearable Sensors and Computer-Vision-Based Methods

    No full text
    Real-time sensing and modeling of the human body, especially the hands, is an important research endeavor for various applicative purposes such as in natural human computer interactions. Hand pose estimation is a big academic and technical challenge due to the complex structure and dexterous movement of human hands. Boosted by advancements from both hardware and artificial intelligence, various prototypes of data gloves and computer-vision-based methods have been proposed for accurate and rapid hand pose estimation in recent years. However, existing reviews either focused on data gloves or on vision methods or were even based on a particular type of camera, such as the depth camera. The purpose of this survey is to conduct a comprehensive and timely review of recent research advances in sensor-based hand pose estimation, including wearable and vision-based solutions. Hand kinematic models are firstly discussed. An in-depth review is conducted on data gloves and vision-based sensor systems with corresponding modeling methods. Particularly, this review also discusses deep-learning-based methods, which are very promising in hand pose estimation. Moreover, the advantages and drawbacks of the current hand gesture estimation methods, the applicative scope, and related challenges are also discussed

    Development of an active vision system for robot inspection of complex objects

    Get PDF
    Dissertação de mestrado integrado em Engenharia Mecânica (área de especialização em Sistemas Mecatrónicos)The dissertation presented here is in the scope of the IntVis4Insp project between University of Minho and the company Neadvance. It focuses on the development of a 3D hand tracking system that must be capable of extracting the hand position and orientation to prepare a manipulator for automatic inspection of leather pieces. This work starts with a literature review about the two main methods for collecting the necessary data to perform 3D hand tracking. These divide into glove-based methods and vision-based methods. The first ones work with some kind of support mounted on the hand that holds all the necessary sensors to measure the desired parameters. While the second ones recur to one or more cameras to capture the hands and through computer vision algorithms track their position and configuration. The selected method for this work was the vision-based method Openpose. For each recorded image, this application can locate 21 hand keypoints on each hand that together form a skeleton of the hands. This application is used in the tracking system developed throughout this dissertation. Its information is used in a more complete pipeline where the location of those hand keypoints is crucial to track the hands in videos of the demonstrated movements. These videos were recorded with an RGB-D camera, the Microsoft Kinect, which provides a depth value for every RGB pixel recorded. With the depth information and the 2D location of the hand keypoints in the images, it was possible to obtain the 3D world coordinates of these points considering the pinhole camera model. To define the hand, position a point is selected among the 21 for each hand, but for the hand orientation, it was necessary to develop an auxiliary method called “Iterative Pose Estimation Method” (ITP), which estimates the complete 3D pose of the hands. This method recurs only to the 2D locations of every hand keypoint, and the complete 3D world coordinates of the wrists to estimate the right 3D world coordinates of all the remaining points on the hand. This solution solves the problems related to hand occlusions that a prone to happen due to the use of only one camera to record the inspection videos. Once the world location of all the points in the hands is accurately estimated, their orientation can be defined by selecting three points forming a plane.A dissertação aqui apresentada insere-se no âmbito do projeto IntVis4Insp entre a Universidade do Minho e a empresa Neadavance, e foca-se no desenvolvimento de um sistema para extração da posição e orientação das mãos no espaço para posterior auxílio na manipulação automática de peças de couro, com recurso a manipuladores robóticos. O trabalho inicia-se com uma revisão literária sobre os dois principais métodos existentes para efetuar a recolha de dados necessária à monitorização da posição e orientação das mãos ao longo do tempo. Estes dividem-se em métodos baseados em luvas ou visão. No caso dos primeiros, estes recorrem normalmente a algum tipo de suporte montado na mão (ex.: luva em tecido), onde estão instalados todos os sensores necessários para a medição dos parâmetros desejados. Relativamente a sistemas de visão estes recorrem a uma câmara ou conjunto delas para capturar as mãos e por via de algoritmos de visão por computador determinam a sua posição e configuração. Foi selecionado para este trabalho um algoritmo de visão por computador denominado por Openpose. Este é capaz de, em cada imagem gravada e para cada mão, localizar 21 pontos pertencentes ao seu esqueleto. Esta aplicação é inserida no sistema de monitorização desenvolvido, sendo utilizada a sua informação numa arquitetura mais completa onde é efetuada a extração da localização dos pontos chave de cada mão nos vídeos de demonstração dos movimentos de inspeção. A gravação destes vídeos é efetuada com uma câmara RGB-D, a Microsoft Kinect, que fornece um valor de profundidade para cada pixel RGB gravado. Com os dados de profundidade e a localização dos pontos chave nas imagens foi possível obter as coordenadas 3D no mundo destes pontos considerando o modelo pinhole para a câmara. No caso da posição da mão é selecionado um ponto de entre os 21 para a definir ao longo do tempo, no entanto, para o cálculo da orientação foi desenvolvido um método auxiliar para estimação da pose tridimensional da mão denominado por “Iterative Pose Estimation Method” (ITP). Este método recorre aos dados 2D do Openpose e às coordenadas 3D do pulso de cada mão para efetuar a correta estimação das coordenadas 3D dos restantes pontos da mão. Isto permite essencialmente resolver problemas com oclusões da mão, muito frequentes com o uso de uma só câmara na gravação dos vídeos. Uma vez estimada corretamente a posição 3D no mundo dos vários pontos da mão, a sua orientação pode ser definida com recurso a quaisquer três pontos que definam um plano

    Design and recognition of microgestures for always-available input

    Get PDF
    Gestural user interfaces for computing devices most commonly require the user to have at least one hand free to interact with the device, for example, moving a mouse, touching a screen, or performing mid-air gestures. Consequently, users find it difficult to operate computing devices while holding or manipulating everyday objects. This limits the users from interacting with the digital world during a significant portion of their everyday activities, such as, using tools in the kitchen or workshop, carrying items, or workout with sports equipment. This thesis pushes the boundaries towards the bigger goal of enabling always-available input. Microgestures have been recognized for their potential to facilitate direct and subtle interactions. However, it remains an open question how to interact using gestures with computing devices when both of the user’s hands are occupied holding everyday objects. We take a holistic approach and focus on three core contributions: i) To understand end-users preferences, we present an empirical analysis of users’ choice of microgestures when holding objects of diverse geometries. Instead of designing a gesture set for a specific object or geometry and to identify gestures that generalize, this thesis leverages the taxonomy of grasp types established from prior research. ii) We tackle the critical problem of avoiding false activation by introducing a novel gestural input concept that leverages a single-finger movement, which stands out from everyday finger motions during holding and manipulating objects. Through a data-driven approach, we also systematically validate the concept’s robustness with different everyday actions. iii) While full sensor coverage on the user’s hand would allow detailed hand-object interaction, minimal instrumentation is desirable for real-world use. This thesis addresses the problem of identifying sparse sensor layouts. We present the first rapid computational method, along with a GUI-based design tool that enables iterative design based on the designer’s high-level requirements. Furthermore, we demonstrate that minimal form-factor devices, like smart rings, can be used to effectively detect microgestures in hands-free and busy scenarios. Overall, the presented findings will serve as both conceptual and technical foundations for enabling interaction with computing devices wherever and whenever users need them.Benutzerschnittstellen für Computergeräte auf Basis von Gesten erfordern für eine Interaktion meist mindestens eine freie Hand, z.B. um eine Maus zu bewegen, einen Bildschirm zu berühren oder Gesten in der Luft auszuführen. Daher ist es für Nutzer schwierig, Geräte zu bedienen, während sie Gegenstände halten oder manipulieren. Dies schränkt die Interaktion mit der digitalen Welt während eines Großteils ihrer alltäglichen Aktivitäten ein, etwa wenn sie Küchengeräte oder Werkzeug verwenden, Gegenstände tragen oder mit Sportgeräten trainieren. Diese Arbeit erforscht neue Wege in Richtung des größeren Ziels, immer verfügbare Eingaben zu ermöglichen. Das Potential von Mikrogesten für die Erleichterung von direkten und feinen Interaktionen wurde bereits erkannt. Die Frage, wie der Nutzer mit Geräten interagiert, wenn beide Hände mit dem Halten von Gegenständen belegt sind, bleibt jedoch offen. Wir verfolgen einen ganzheitlichen Ansatz und konzentrieren uns auf drei Kernbeiträge: i) Um die Präferenzen der Endnutzer zu verstehen, präsentieren wir eine empirische Analyse der Wahl von Mikrogesten beim Halten von Objekte mit diversen Geometrien. Anstatt einen Satz an Gesten für ein bestimmtes Objekt oder eine bestimmte Geometrie zu entwerfen, nutzt diese Arbeit die aus früheren Forschungen stammenden Taxonomien an Griff-Typen. ii) Wir adressieren das Problem falscher Aktivierungen durch ein neuartiges Eingabekonzept, das die sich von alltäglichen Fingerbewegungen abhebende Bewegung eines einzelnen Fingers nutzt. Durch einen datengesteuerten Ansatz validieren wir zudem systematisch die Robustheit des Konzepts bei diversen alltäglichen Aktionen. iii) Auch wenn eine vollständige Sensorabdeckung an der Hand des Nutzers eine detaillierte Hand-Objekt-Interaktion ermöglichen würde, ist eine minimale Ausstattung für den Einsatz in der realen Welt wünschenswert. Diese Arbeit befasst sich mit der Identifizierung reduzierter Sensoranordnungen. Wir präsentieren die erste, schnelle Berechnungsmethode in einem GUI-basierten Designtool, das iteratives Design basierend auf den Anforderungen des Designers ermöglicht. Wir zeigen zudem, dass Geräte mit minimalem Formfaktor wie smarte Ringe für die Erkennung von Mikrogesten verwendet werden können. Insgesamt dienen die vorgestellten Ergebnisse sowohl als konzeptionelle als auch als technische Grundlage für die Realisierung von Interaktion mit Computergeräten wo und wann immer Nutzer sie benötigen.Bosch Researc

    Deep Learning Methods for Hand Gesture Recognition via High-Density Surface Electromyogram (HD-sEMG) Signals

    Get PDF
    Hand Gesture Recognition (HGR) using surface Electromyogram (sEMG) signals can be considered as one of the most important technologies in making efficient Human Machine Interface (HMI) systems. In particular, sEMG-based hand gesture has been a topic of growing interest for development of assistive systems to improve the quality of life in individuals suffering from amputated limbs. Generally speaking, myoelectric prosthetic devices work by classifying existing patterns of the collected sEMG signals and synthesizing intended gestures. While conventional myoelectric control systems, e.g., on/off control or direct-proportional, have potential advantages, challenges such as limited Degree of Freedom (DoF) due to crosstalk have resulted in the emergence of data-driven solutions. More specifically, to improve efficiency, intuitiveness, and the control performance of hand prosthetic systems, several Artificial Intelligence (AI) algorithms ranging from conventional Machine Learning (ML) models to highly complicated Deep Neural Network (DNN) architectures have been designed for sEMG-based hand gesture recognition in myoelectric prosthetic devices. In this thesis, we, first, perform a literature review on hand gesture recognition methods and elaborate on the recently proposed Deep Learning/Machine Learning (DL/ML) models in the literature. Then, our utilized High-Density sEMG (HD-sEMG) dataset is introduced and the rationales behind our main focus on this particular type of sEMG dataset are explained. We, then, develop a Vision Transformer (ViT)-based model for gesture recognition with HD-sEMG signals and evaluate its performance under different conditions such as variable window sizes, number of electrode channels, and model's complexity. We compare its performance with that of two conventional ML and one DL algorithm that are typically adopted in this domain. Furthermore, we introduce another capability of our proposed framework for instantaneous training, which is its ability to classify hand gestures based on a single frame of HD-sEMG dataset. Following that, we introduce the idea of integrating the macroscopic and microscopic neural drive information obtained from HD-sEMG data into a hybrid ViT-based framework for gesture recognition, which outperforms a standalone ViT architecture in terms of classification accuracy. Here, microscopic neural drive information (also called Motor Unit Spike Trains) refers to the neural commands sent by the brain and spinal cord to individual muscle fibers and are extracted from HD-sEMG signals using Blind Source Separation (BSP) algorithms. Finally, we design an alternative and novel hand gesture recognition model based on the less-explored topic of Spiking Neural Networks (SNN), which performs spatio-temporal gesture recognition in an event-based fashion. As opposed to the classical DNN architectures, SNNs are of the capacity to imitate human brain's cognitive function by using biologically inspired models of neurons and synapses. Therefore, they are more biologically explainable and computationally efficient
    corecore