269 research outputs found

    GANerated Hands for Real-time 3D Hand Tracking from Monocular RGB

    Full text link
    We address the highly challenging problem of real-time 3D hand tracking based on a monocular RGB-only sequence. Our tracking method combines a convolutional neural network with a kinematic 3D hand model, such that it generalizes well to unseen data, is robust to occlusions and varying camera viewpoints, and leads to anatomically plausible as well as temporally smooth hand motions. For training our CNN we propose a novel approach for the synthetic generation of training data that is based on a geometrically consistent image-to-image translation network. To be more specific, we use a neural network that translates synthetic images to "real" images, such that the so-generated images follow the same statistical distribution as real-world hand images. For training this translation network we combine an adversarial loss and a cycle-consistency loss with a geometric consistency loss in order to preserve geometric properties (such as hand pose) during translation. We demonstrate that our hand tracking system outperforms the current state-of-the-art on challenging RGB-only footage

    Single-Shot Multi-Person 3D Pose Estimation From Monocular RGB

    Full text link
    We propose a new single-shot method for multi-person 3D pose estimation in general scenes from a monocular RGB camera. Our approach uses novel occlusion-robust pose-maps (ORPM) which enable full body pose inference even under strong partial occlusions by other people and objects in the scene. ORPM outputs a fixed number of maps which encode the 3D joint locations of all people in the scene. Body part associations allow us to infer 3D pose for an arbitrary number of people without explicit bounding box prediction. To train our approach we introduce MuCo-3DHP, the first large scale training data set showing real images of sophisticated multi-person interactions and occlusions. We synthesize a large corpus of multi-person images by compositing images of individual people (with ground truth from mutli-view performance capture). We evaluate our method on our new challenging 3D annotated multi-person test set MuPoTs-3D where we achieve state-of-the-art performance. To further stimulate research in multi-person 3D pose estimation, we will make our new datasets, and associated code publicly available for research purposes.Comment: International Conference on 3D Vision (3DV), 201

    ARE YOU WILLING TO WAIT LONGER FOR INTERNET PRIVACY?

    Get PDF
    It becomes increasingly common for governments, service providers and specialized data aggregators to systematically collect traces of personal communication on the Internet without the user’s knowledge or approval. An analysis of these personal traces by data mining algorithms can reveal sensitive personal information, such as location data, behavioral patterns, or personal profiles including preferences and dislikes. Recent studies show that this information can be used for various purposes, for example by insurance companies or banks to identify potentially risky customers, by governments to observe their citizens, and also by repressive regimes to monitor political opponents. Online anonymity software, such as Tor, can help users to protect their privacy, but often comes at the prize of low usability, e.g., by causing increased latency during surfing. In this exploratory study, we determine factors that influence the usage of Internet anonymity software. In particular, we show that Internet literacy, Internet privacy awareness and Internet privacy concerns are important antecedents for determining an Internet user’s intention to use anonymity software, and that Internet patience has a positive moderating effect on the intention to use anonymity software, as well as on its perceived usefulness

    Generative Model-Based Loss to the Rescue: A Method to Overcome Annotation Errors for Depth-Based Hand Pose Estimation

    Get PDF
    We propose to use a model-based generative loss for training hand pose estimators on depth images based on a volumetric hand model. This additional loss allows training of a hand pose estimator that accurately infers the entire set of 21 hand keypoints while only using supervision for 6 easy-to-annotate keypoints (fingertips and wrist). We show that our partially-supervised method achieves results that are comparable to those of fully-supervised methods which enforce articulation consistency. Moreover, for the first time we demonstrate that such an approach can be used to train on datasets that have erroneous annotations, i.e. "ground truth" with notable measurement errors, while obtaining predictions that explain the depth images better than the given "ground truth"

    XNect: Real-time Multi-Person 3D Motion Capture with a Single RGB Camera

    Full text link
    We present a real-time approach for multi-person 3D motion capture at over 30 fps using a single RGB camera. It operates successfully in generic scenes which may contain occlusions by objects and by other people. Our method operates in subsequent stages. The first stage is a convolutional neural network (CNN) that estimates 2D and 3D pose features along with identity assignments for all visible joints of all individuals.We contribute a new architecture for this CNN, called SelecSLS Net, that uses novel selective long and short range skip connections to improve the information flow allowing for a drastically faster network without compromising accuracy. In the second stage, a fully connected neural network turns the possibly partial (on account of occlusion) 2Dpose and 3Dpose features for each subject into a complete 3Dpose estimate per individual. The third stage applies space-time skeletal model fitting to the predicted 2D and 3D pose per subject to further reconcile the 2D and 3D pose, and enforce temporal coherence. Our method returns the full skeletal pose in joint angles for each subject. This is a further key distinction from previous work that do not produce joint angle results of a coherent skeleton in real time for multi-person scenes. The proposed system runs on consumer hardware at a previously unseen speed of more than 30 fps given 512x320 images as input while achieving state-of-the-art accuracy, which we will demonstrate on a range of challenging real-world scenes.Comment: To appear in ACM Transactions on Graphics (SIGGRAPH) 202

    A thalamocortical pathway for fast rerouting of tactile information to occipital cortex in congenital blindness

    Get PDF
    In congenitally blind individuals, the occipital cortex responds to various nonvisual inputs. Some animal studies raise the possibility that a subcortical pathway allows fast re-routing of tactile information to the occipital cortex, but this has not been shown in humans. Here we show using magnetoencephalography (MEG) that tactile stimulation produces occipital cortex activations, starting as early as 35 ms in congenitally blind individuals, but not in blindfolded sighted controls. Given our measured thalamic response latencies of 20 ms and a mean estimated lateral geniculate nucleus to primary visual cortex transfer time of 15 ms, we claim that this early occipital response is mediated by a direct thalamo-cortical pathway. We also observed stronger directed connectivity in the alpha band range from posterior thalamus to occipital cortex in congenitally blind participants. Our results strongly suggest the contribution of a fast thalamo-cortical pathway in the cross-modal activation of the occipital cortex in congenitally blind humans
    • …
    corecore