135 research outputs found

    GANerated Hands for Real-time 3D Hand Tracking from Monocular RGB

    Full text link
    We address the highly challenging problem of real-time 3D hand tracking based on a monocular RGB-only sequence. Our tracking method combines a convolutional neural network with a kinematic 3D hand model, such that it generalizes well to unseen data, is robust to occlusions and varying camera viewpoints, and leads to anatomically plausible as well as temporally smooth hand motions. For training our CNN we propose a novel approach for the synthetic generation of training data that is based on a geometrically consistent image-to-image translation network. To be more specific, we use a neural network that translates synthetic images to "real" images, such that the so-generated images follow the same statistical distribution as real-world hand images. For training this translation network we combine an adversarial loss and a cycle-consistency loss with a geometric consistency loss in order to preserve geometric properties (such as hand pose) during translation. We demonstrate that our hand tracking system outperforms the current state-of-the-art on challenging RGB-only footage

    Real-time 3D hand reconstruction in challenging scenes from a single color or depth camera

    Get PDF
    Hands are one of the main enabling factors for performing complex tasks and humans naturally use them for interactions with their environment. Reconstruction and digitization of 3D hand motion opens up many possibilities for important applications. Hands gestures can be directly used for human–computer interaction, which is especially relevant for controlling augmented or virtual reality (AR/VR) devices where immersion is of utmost importance. In addition, 3D hand motion capture is a precondition for automatic sign-language translation, activity recognition, or teaching robots. Different approaches for 3D hand motion capture have been actively researched in the past. While being accurate, gloves and markers are intrusive and uncomfortable to wear. Hence, markerless hand reconstruction based on cameras is desirable. Multi-camera setups provide rich input, however, they are hard to calibrate and lack the flexibility for mobile use cases. Thus, the majority of more recent methods uses a single color or depth camera which, however, makes the problem harder due to more ambiguities in the input. For interaction purposes, users need continuous control and immediate feedback. This means the algorithms have to run in real time and be robust in uncontrolled scenes. These requirements, achieving 3D hand reconstruction in real time from a single camera in general scenes, make the problem significantly more challenging. While recent research has shown promising results, current state-of-the-art methods still have strong limitations. Most approaches only track the motion of a single hand in isolation and do not take background-clutter or interactions with arbitrary objects or the other hand into account. The few methods that can handle more general and natural scenarios run far from real time or use complex multi-camera setups. Such requirements make existing methods unusable for many aforementioned applications. This thesis pushes the state of the art for real-time 3D hand tracking and reconstruction in general scenes from a single RGB or depth camera. The presented approaches explore novel combinations of generative hand models, which have been used successfully in the computer vision and graphics community for decades, and powerful cutting-edge machine learning techniques, which have recently emerged with the advent of deep learning. In particular, this thesis proposes a novel method for hand tracking in the presence of strong occlusions and clutter, the first method for full global 3D hand tracking from in-the-wild RGB video, and a method for simultaneous pose and dense shape reconstruction of two interacting hands that, for the first time, combines a set of desirable properties previously unseen in the literature.Hände sind einer der Hauptfaktoren für die Ausführung komplexer Aufgaben, und Menschen verwenden sie auf natürliche Weise für Interaktionen mit ihrer Umgebung. Die Rekonstruktion und Digitalisierung der 3D-Handbewegung eröffnet viele Möglichkeiten für wichtige Anwendungen. Handgesten können direkt als Eingabe für die Mensch-Computer-Interaktion verwendet werden. Dies ist insbesondere für Geräte der erweiterten oder virtuellen Realität (AR / VR) relevant, bei denen die Immersion von größter Bedeutung ist. Darüber hinaus ist die Rekonstruktion der 3D Handbewegung eine Voraussetzung zur automatischen Übersetzung von Gebärdensprache, zur Aktivitätserkennung oder zum Unterrichten von Robotern. In der Vergangenheit wurden verschiedene Ansätze zur 3D-Handbewegungsrekonstruktion aktiv erforscht. Handschuhe und physische Markierungen sind zwar präzise, aber aufdringlich und unangenehm zu tragen. Daher ist eine markierungslose Handrekonstruktion auf der Basis von Kameras wünschenswert. Multi-Kamera-Setups bieten umfangreiche Eingabedaten, sind jedoch schwer zu kalibrieren und haben keine Flexibilität für mobile Anwendungsfälle. Daher verwenden die meisten neueren Methoden eine einzelne Farb- oder Tiefenkamera, was die Aufgabe jedoch schwerer macht, da mehr Ambiguitäten in den Eingabedaten vorhanden sind. Für Interaktionszwecke benötigen Benutzer kontinuierliche Kontrolle und sofortiges Feedback. Dies bedeutet, dass die Algorithmen in Echtzeit ausgeführt werden müssen und robust in unkontrollierten Szenen sein müssen. Diese Anforderungen, 3D-Handrekonstruktion in Echtzeit mit einer einzigen Kamera in allgemeinen Szenen, machen das Problem erheblich schwieriger. Während neuere Forschungsarbeiten vielversprechende Ergebnisse gezeigt haben, weisen aktuelle Methoden immer noch Einschränkungen auf. Die meisten Ansätze verfolgen die Bewegung einer einzelnen Hand nur isoliert und berücksichtigen keine alltäglichen Umgebungen oder Interaktionen mit beliebigen Objekten oder der anderen Hand. Die wenigen Methoden, die allgemeinere und natürlichere Szenarien verarbeiten können, laufen nicht in Echtzeit oder verwenden komplexe Multi-Kamera-Setups. Solche Anforderungen machen bestehende Verfahren für viele der oben genannten Anwendungen unbrauchbar. Diese Dissertation erweitert den Stand der Technik für die Echtzeit-3D-Handverfolgung und -Rekonstruktion in allgemeinen Szenen mit einer einzelnen RGB- oder Tiefenkamera. Die vorgestellten Algorithmen erforschen neue Kombinationen aus generativen Handmodellen, die seit Jahrzehnten erfolgreich in den Bereichen Computer Vision und Grafik eingesetzt werden, und leistungsfähigen innovativen Techniken des maschinellen Lernens, die vor kurzem mit dem Aufkommen neuronaler Netzwerke entstanden sind. In dieser Arbeit werden insbesondere vorgeschlagen: eine neuartige Methode zur Handbewegungsrekonstruktion bei starken Verdeckungen und in unkontrollierten Szenen, die erste Methode zur Rekonstruktion der globalen 3D Handbewegung aus RGB-Videos in freier Wildbahn und die erste Methode zur gleichzeitigen Rekonstruktion von Handpose und -form zweier interagierender Hände, die eine Reihe wünschenwerter Eigenschaften komibiniert

    A Preliminary Investigation into a Deep Learning Implementation for Hand Tracking on Mobile Devices

    Get PDF
    Hand tracking is an essential component of computer graphics and human-computer interaction applications. The use of RGB camera without specific hardware and sensors (e.g., depth cameras) allows developing solutions for a plethora of devices and platforms. Although various methods were proposed, hand tracking from a single RGB camera is still a challenging research area due to occlusions, complex backgrounds, and various hand poses and gestures. We present a mobile application for 2D hand tracking from RGB images captured by the smartphone camera. The images are processed by a deep neural network, modified specifically to tackle this task and run on mobile devices, looking for a compromise between performance and computational time. Network output is used to show a 2D skeleton on the user's hand. We tested our system on several scenarios, showing an interactive hand tracking level and achieving promising results in the case of variable brightness and backgrounds and small occlusions

    Monocular Real-time Hand Shape and Motion Capture using Multi-modal Data

    Get PDF
    We present a novel method for monocular hand shape and pose estimation at unprecedented runtime performance of 100fps and at state-of-the-art accuracy. This is enabled by a new learning based architecture designed such that it can make use of all the sources of available hand training data: image data with either 2D or 3D annotations, as well as stand-alone 3D animations without corresponding image data. It features a 3D hand joint detection module and an inverse kinematics module which regresses not only 3D joint positions but also maps them to joint rotations in a single feed-forward pass. This output makes the method more directly usable for applications in computer vision and graphics compared to only regressing 3D joint positions. We demonstrate that our architectural design leads to a significant quantitative and qualitative improvement over the state of the art on several challenging benchmarks. Our model is publicly available for future research

    A Dataset of Relighted 3D Interacting Hands

    Full text link
    The two-hand interaction is one of the most challenging signals to analyze due to the self-similarity, complicated articulations, and occlusions of hands. Although several datasets have been proposed for the two-hand interaction analysis, all of them do not achieve 1) diverse and realistic image appearances and 2) diverse and large-scale groundtruth (GT) 3D poses at the same time. In this work, we propose Re:InterHand, a dataset of relighted 3D interacting hands that achieve the two goals. To this end, we employ a state-of-the-art hand relighting network with our accurately tracked two-hand 3D poses. We compare our Re:InterHand with existing 3D interacting hands datasets and show the benefit of it. Our Re:InterHand is available in https://mks0601.github.io/ReInterHand/.Comment: Accepted by NeurIPS 2023 (Datasets and Benchmarks Track
    • …
    corecore