9 research outputs found

    Neural Scene Decomposition for Multi-Person Motion Capture

    Get PDF
    Learning general image representations has proven key to the success of many computer vision tasks. For example, many approaches to image understanding problems rely on deep networks that were initially trained on ImageNet, mostly because the learned features are a valuable starting point to learn from limited labeled data. However, when it comes to 3D motion capture of multiple people, these features are only of limited use. In this paper, we therefore propose an approach to learning features that are useful for this purpose. To this end, we introduce a self-supervised approach to learning what we call a neural scene decomposition (NSD) that can be exploited for 3D pose estimation. NSD comprises three layers of abstraction to represent human subjects: spatial layout in terms of bounding-boxes and relative depth; a 2D shape representation in terms of an instance segmentation mask; and subject-specific appearance and 3D pose information. By exploiting self-supervision coming from multiview data, our NSD model can be trained end-to-end without any 2D or 3D supervision. In contrast to previous approaches, it works for multiple persons and full-frame images. Because it encodes 3D geometry, NSD can then be effectively leveraged to train a 3D pose estimation network from small amounts of annotated data. Our code and newly introduced boxing dataset is available at github.com and cvlab.epfl.ch

    Reconstructing 3d Humans From Images

    Get PDF
    The past decade we have seen remarkable progress in Computer Vision, fueled by the recent advances in Deep Learning. Unsurprisingly, human perception has been the center of attention. We now have access to systems that can work remarkably well for traditional 2D tasks like segmentation or pose estimation. However, scaling this to 3D remains particularly challenging because of the inherent ambiguities and the scarcity of annotations. The goal of this dissertation is to describe our contributions towards automating 3D human reconstruction from images. First, we will explore the use of different representations for human mesh recovery, discuss their advantages and show how they can be useful for learning deformations beyond standard parametric body models. Next, motivated by the limited availability of annotated data, we will present a method that leverages a collaboration between regression and optimization methods to successfully address this. Subsequently, we will describe our work on modeling the ambiguities in 3D human reconstruction and demonstrate its usefulness for solving a variety of downstream tasks, such as human body model fitting. Last, we will move beyond single-person 3D pose estimation and show how we can scale our methods to work on challenging real-world scenes with multiple humans
    corecore