1,683 research outputs found

    Hand Keypoint Detection in Single Images using Multiview Bootstrapping

    Full text link
    We present an approach that uses a multi-camera system to train fine-grained detectors for keypoints that are prone to occlusion, such as the joints of a hand. We call this procedure multiview bootstrapping: first, an initial keypoint detector is used to produce noisy labels in multiple views of the hand. The noisy detections are then triangulated in 3D using multiview geometry or marked as outliers. Finally, the reprojected triangulations are used as new labeled training data to improve the detector. We repeat this process, generating more labeled data in each iteration. We derive a result analytically relating the minimum number of views to achieve target true and false positive rates for a given detector. The method is used to train a hand keypoint detector for single images. The resulting keypoint detector runs in realtime on RGB images and has accuracy comparable to methods that use depth sensors. The single view detector, triangulated over multiple views, enables 3D markerless hand motion capture with complex object interactions.Comment: CVPR 201

    GANerated Hands for Real-time 3D Hand Tracking from Monocular RGB

    Full text link
    We address the highly challenging problem of real-time 3D hand tracking based on a monocular RGB-only sequence. Our tracking method combines a convolutional neural network with a kinematic 3D hand model, such that it generalizes well to unseen data, is robust to occlusions and varying camera viewpoints, and leads to anatomically plausible as well as temporally smooth hand motions. For training our CNN we propose a novel approach for the synthetic generation of training data that is based on a geometrically consistent image-to-image translation network. To be more specific, we use a neural network that translates synthetic images to "real" images, such that the so-generated images follow the same statistical distribution as real-world hand images. For training this translation network we combine an adversarial loss and a cycle-consistency loss with a geometric consistency loss in order to preserve geometric properties (such as hand pose) during translation. We demonstrate that our hand tracking system outperforms the current state-of-the-art on challenging RGB-only footage

    Towards markerless orthopaedic navigation with intuitive Optical See-through Head-mounted displays

    Get PDF
    The potential of image-guided orthopaedic navigation to improve surgical outcomes has been well-recognised during the last two decades. According to the tracked pose of target bone, the anatomical information and preoperative plans are updated and displayed to surgeons, so that they can follow the guidance to reach the goal with higher accuracy, efficiency and reproducibility. Despite their success, current orthopaedic navigation systems have two main limitations: for target tracking, artificial markers have to be drilled into the bone and calibrated manually to the bone, which introduces the risk of additional harm to patients and increases operating complexity; for guidance visualisation, surgeons have to shift their attention from the patient to an external 2D monitor, which is disruptive and can be mentally stressful. Motivated by these limitations, this thesis explores the development of an intuitive, compact and reliable navigation system for orthopaedic surgery. To this end, conventional marker-based tracking is replaced by a novel markerless tracking algorithm, and the 2D display is replaced by a 3D holographic Optical see-through (OST) Head-mounted display (HMD) precisely calibrated to a user's perspective. Our markerless tracking, facilitated by a commercial RGBD camera, is achieved through deep learning-based bone segmentation followed by real-time pose registration. For robust segmentation, a new network is designed and efficiently augmented by a synthetic dataset. Our segmentation network outperforms the state-of-the-art regarding occlusion-robustness, device-agnostic behaviour, and target generalisability. For reliable pose registration, a novel Bounded Iterative Closest Point (BICP) workflow is proposed. The improved markerless tracking can achieve a clinically acceptable error of 0.95 deg and 2.17 mm according to a phantom test. OST displays allow ubiquitous enrichment of perceived real world with contextually blended virtual aids through semi-transparent glasses. They have been recognised as a suitable visual tool for surgical assistance, since they do not hinder the surgeon's natural eyesight and require no attention shift or perspective conversion. The OST calibration is crucial to ensure locational-coherent surgical guidance. Current calibration methods are either human error-prone or hardly applicable to commercial devices. To this end, we propose an offline camera-based calibration method that is highly accurate yet easy to implement in commercial products, and an online alignment-based refinement that is user-centric and robust against user error. The proposed methods are proven to be superior to other similar State-of- the-art (SOTA)s regarding calibration convenience and display accuracy. Motivated by the ambition to develop the world's first markerless OST navigation system, we integrated the developed markerless tracking and calibration scheme into a complete navigation workflow designed for femur drilling tasks during knee replacement surgery. We verify the usability of our designed OST system with an experienced orthopaedic surgeon by a cadaver study. Our test validates the potential of the proposed markerless navigation system for surgical assistance, although further improvement is required for clinical acceptance.Open Acces

    Cognitive Robotics in Industrial Environments

    Get PDF

    ISAR: Ein Autorensystem für Interaktive Tische

    Get PDF
    Developing augmented reality systems involves several challenges, that prevent end users and experts from non-technical domains, such as education, to experiment with this technology. In this research we introduce ISAR, an authoring system for augmented reality tabletops targeting users from non-technical domains. ISAR allows non-technical users to create their own interactive tabletop applications and experiment with the use of this technology in domains such as educations, industrial training, and medical rehabilitation.Die Entwicklung von Augmented-Reality-Systemen ist mit mehreren Herausforderungen verbunden, die Endbenutzer und Experten aus nicht-technischen Bereichen, wie z.B. dem Bildungswesen, daran hindern, mit dieser Technologie zu experimentieren. In dieser Forschung stellen wir ISAR vor, ein Autorensystem für Augmented-Reality-Tabletops, das sich an Benutzer aus nicht-technischen Bereichen richtet. ISAR ermöglicht es nicht-technischen Anwendern, ihre eigenen interaktiven Tabletop-Anwendungen zu erstellen und mit dem Einsatz dieser Technologie in Bereichen wie Bildung, industrieller Ausbildung und medizinischer Rehabilitation zu experimentieren

    3-D Hand Pose Estimation from Kinect's Point Cloud Using Appearance Matching

    Full text link
    We present a novel appearance-based approach for pose estimation of a human hand using the point clouds provided by the low-cost Microsoft Kinect sensor. Both the free-hand case, in which the hand is isolated from the surrounding environment, and the hand-object case, in which the different types of interactions are classified, have been considered. The hand-object case is clearly the most challenging task having to deal with multiple tracks. The approach proposed here belongs to the class of partial pose estimation where the estimated pose in a frame is used for the initialization of the next one. The pose estimation is obtained by applying a modified version of the Iterative Closest Point (ICP) algorithm to synthetic models to obtain the rigid transformation that aligns each model with respect to the input data. The proposed framework uses a "pure" point cloud as provided by the Kinect sensor without any other information such as RGB values or normal vector components. For this reason, the proposed method can also be applied to data obtained from other types of depth sensor, or RGB-D camera

    Neuron-level dynamics of oscillatory network structure and markerless tracking of kinematics during grasping

    Get PDF
    Oscillatory synchrony is proposed to play an important role in flexible sensory-motor transformations. Thereby, it is assumed that changes in the oscillatory network structure at the level of single neurons lead to flexible information processing. Yet, how the oscillatory network structure at the neuron-level changes with different behavior remains elusive. To address this gap, we examined changes in the fronto-parietal oscillatory network structure at the neuron-level, while monkeys performed a flexible sensory-motor grasping task. We found that neurons formed separate subnetworks in the low frequency and beta bands. The beta subnetwork was active during steady states and the low frequency network during active states of the task, suggesting that both frequencies are mutually exclusive at the neuron-level. Furthermore, both frequency subnetworks reconfigured at the neuron-level for different grip and context conditions, which was mostly lost at any scale larger than neurons in the network. Our results, therefore, suggest that the oscillatory network structure at the neuron-level meets the necessary requirements for the coordination of flexible sensory-motor transformations. Supplementarily, tracking hand kinematics is a crucial experimental requirement to analyze neuronal control of grasp movements. To this end, a 3D markerless, gloveless hand tracking system was developed using computer vision and deep learning techniques. 2021-11-3

    Markerless 3D human pose tracking through multiple cameras and AI: Enabling high accuracy, robustness, and real-time performance

    Full text link
    Tracking 3D human motion in real-time is crucial for numerous applications across many fields. Traditional approaches involve attaching artificial fiducial objects or sensors to the body, limiting their usability and comfort-of-use and consequently narrowing their application fields. Recent advances in Artificial Intelligence (AI) have allowed for markerless solutions. However, most of these methods operate in 2D, while those providing 3D solutions compromise accuracy and real-time performance. To address this challenge and unlock the potential of visual pose estimation methods in real-world scenarios, we propose a markerless framework that combines multi-camera views and 2D AI-based pose estimation methods to track 3D human motion. Our approach integrates a Weighted Least Square (WLS) algorithm that computes 3D human motion from multiple 2D pose estimations provided by an AI-driven method. The method is integrated within the Open-VICO framework allowing simulation and real-world execution. Several experiments have been conducted, which have shown high accuracy and real-time performance, demonstrating the high level of readiness for real-world applications and the potential to revolutionize human motion capture.Comment: 19 pages, 7 figure
    corecore