2,174 research outputs found

    Effects of Variability in Synthetic Training Data on Convolutional Neural Networks for 3D Head Reconstruction

    Get PDF
    Göpfert JP, Göpfert C, Botsch M, Hammer B. Effects of Variability in Synthetic Training Data on Convolutional Neural Networks for 3D Head Reconstruction. In: 2017 SSCI Proceedings. 2017 IEEE Symposium Series on Computational Intelligence (SSCI). Piscataway, NJ: IEEE; 2017

    xR-EgoPose: Egocentric 3D Human Pose from an HMD Camera

    Get PDF
    We present a new solution to egocentric 3D body pose estimation from monocular images captured from a downward looking fish-eye camera installed on the rim of a head mounted virtual reality device. This unusual viewpoint, just 2 cm. away from the user's face, leads to images with unique visual appearance, characterized by severe self-occlusions and strong perspective distortions that result in a drastic difference in resolution between lower and upper body. Our contribution is two-fold. Firstly, we propose a new encoder-decoder architecture with a novel dual branch decoder designed specifically to account for the varying uncertainty in the 2D joint locations. Our quantitative evaluation, both on synthetic and real-world datasets, shows that our strategy leads to substantial improvements in accuracy over state of the art egocentric pose estimation approaches. Our second contribution is a new large-scale photorealistic synthetic dataset - xR-EgoPose - offering 383K frames of high quality renderings of people with a diversity of skin tones, body shapes, clothing, in a variety of backgrounds and lighting conditions, performing a range of actions. Our experiments show that the high variability in our new synthetic training corpus leads to good generalization to real world footage and to state of the art results on real world datasets with ground truth. Moreover, an evaluation on the Human3.6M benchmark shows that the performance of our method is on par with top performing approaches on the more classic problem of 3D human pose from a third person viewpoint.Comment: ICCV 201

    xR-EgoPose: Egocentric 3D Human Pose from an HMD Camera

    Get PDF
    We present a new solution to egocentric 3D body pose estimation from monocular images captured from a downward looking fish-eye camera installed on the rim of a head mounted virtual reality device. This unusual viewpoint, just 2 cm away from the user's face, leads to images with unique visual appearance, characterized by severe self-occlusions and strong perspective distortions that result in a drastic difference in resolution between lower and upper body. Our contribution is two-fold. Firstly, we propose a new encoder-decoder architecture with a novel dual branch decoder designed specifically to account for the varying uncertainty in the 2D joint locations. Our quantitative evaluation, both on synthetic and real-world datasets, shows that our strategy leads to substantial improvements in accuracy over state of the art egocentric pose estimation approaches. Our second contribution is a new large-scale photorealistic synthetic dataset - xR-EgoPose - offering 383K frames of high quality renderings ofpeople with a diversity of skin tones, body shapes, clothing, in a variety of backgrounds and lighting conditions, performing a range of actions. Our experiments show that the high variability in our new synthetic training corpus leads to good generalization to real world footage and to state of the art results on real world datasets with ground truth. Moreover, an evaluation on the Human3.6M benchmark shows that the performance of our method is on par with top performing approaches on the more classic problem of 3D human pose from a third person viewpoint

    More is Better: 3D Human Pose Estimation from Complementary Data Sources

    Get PDF
    Computer Vision (CV) research has been playing a strategic role in many different complex scenarios that are becoming fundamental components in our everyday life. From Augmented/Virtual reality (AR/VR) to Human-Robot interactions, having a visual interpretation of the surrounding world is the first and most important step to develop new advanced systems. As in other research areas, the boost in performance in Computer Vision algorithms has to be mainly attributed to the widespread usage of deep neural networks. Rather than selecting handcrafted features, such approaches identify which are the best features needed to solve a specific task, by learning them from a corpus of carefully annotated data. Such important property of these neural networks comes with a price: they need very large data collections to learn from. Collecting data is a time consuming and expensive operation that varies, being much harder for some tasks than others. In order to limit additional data collection, we therefore need to carefully design models that can extract as much information as possible from already available dataset, even those collected for neighboring domains. In this work I focus on exploring different solutions for and important research problem in Computer Vision, 3D human pose estimation, that is the task of estimating the 3D skeletal representation of a person characterized in an image/s. This has been done for several configurations: monocular camera, multi-view systems and from egocentric perspectives. First, from a single external front facing camera a semi-supervised approach is used to regress the set of 3D joint positions of the represented person. This is done by fully exploiting all of the available information at all the levels of the network, in a novel manner, as well as allowing the model to be trained with partially labelled data. A multi-camera 3D human pose estimation system is introduced by designing a network trainable in a semi-supervised or even unsupervised manner in a multiview system. Unlike standard motion-captures algorithm, demanding a long and time consuming configuration setup at the beginning of each capturing session, this novel approach requires little to none initial system configuration. Finally, a novel architecture is developed to work in a very specific and significantly harder configuration: 3D human pose estimation when using cameras embedded in a head mounted display (HMD). Due to the limited data availability, the model needs to carefully extract information from the data to properly generalize on unseen images. This is particularly useful in AR/VR use case scenarios, demonstrating the versatility of our network to various working conditions

    Deep deformable models for 3D human body

    Get PDF
    Deformable models are powerful tools for modelling the 3D shape variations for a class of objects. However, currently the application and performance of deformable models for human body are restricted due to the limitations in current 3D datasets, annotations, and the model formulation itself. In this thesis, we address the issue by making the following contributions in the field of 3D human body modelling, monocular reconstruction and data collection/annotation. Firstly, we propose a deep mesh convolutional network based deformable model for 3D human body. We demonstrate the merit of this model in the task of monocular human mesh recovery. While outperforming current state of the art models in mesh recovery accuracy, the model is also light weighted and more flexible as it can be trained end-to-end and fine-tuned for a specific task. A second contribution is a bone level skinned model of 3D human mesh, in which bone modelling and identity-specific variation modelling are decoupled. Such formulation allows the use of mesh convolutional networks for capturing detailed identity specific variations, while explicitly controlling and modelling the pose variations through linear blend skinning with built-in motion constraints. This formulation not only significantly increases the accuracy in 3D human mesh reconstruction, but also facilitates accurate in the wild character animation and retargetting. Finally we present a large scale dataset of over 1.3 million 3D human body scans in daily clothing. The dataset contains over 12 hours of 4D recordings at 30 FPS, consisting of 7566 dynamic sequences of 3D meshes from 4205 subjects. We propose a fast and accurate sequence registration pipeline which facilitates markerless motion capture and automatic dense annotation for the raw scans, leading to automatic synthetic image and annotation generation that boosts the performance for tasks such as monocular human mesh reconstruction.Open Acces
    corecore