13 research outputs found

    A Versatile Scene Model with Differentiable Visibility Applied to Generative Pose Estimation

    Get PDF
    Generative reconstruction methods compute the 3D config-uration (such as pose and/or geometry) of a shape by op-timizing the overlap of the projected 3D shape model with images. Proper handling of occlusions is a big challenge, since the visibility function that indicates if a surface point is seen from a camera can often not be formulated in closed form, and is in general discrete and non-differentiable at oc-clusion boundaries. We present a new scene representation that enables an analytically differentiable closed-form for-mulation of surface visibility. In contrast to previous meth-ods, this yields smooth, analytically differentiable, and effi-cient to optimize pose similarity energies with rigorous oc-clusion handling, fewer local minima, and experimentally verified improved convergence of numerical optimization. The underlying idea is a new image formation model that represents opaque objects by a translucent medium with a smooth Gaussian density distribution which turns visibility into a smooth phenomenon. We demonstrate the advantages of our versatile scene model in several generative pose esti-mation problems, namely marker-less multi-object pose es-timation, marker-less human motion capture with few cam-eras, and image-based 3D geometry estimation. 1

    A Versatile Scene Model with Differentiable Visibility Applied to Generative Pose Estimation

    No full text
    Generative reconstruction methods compute the 3D configuration (such as pose and/or geometry) of a shape by optimizing the overlap of the projected 3D shape model with images. Proper handling of occlusions is a big challenge, since the visibility function that indicates if a surface point is seen from a camera can often not be formulated in closed form, and is in general discrete and non-differentiable at occlusion boundaries. We present a new scene representation that enables an analytically differentiable closed-form formulation of surface visibility. In contrast to previous methods, this yields smooth, analytically differentiable, and efficient to optimize pose similarity energies with rigorous occlusion handling, fewer local minima, and experimentally verified improved convergence of numerical optimization. The underlying idea is a new image formation model that represents opaque objects by a translucent medium with a smooth Gaussian density distribution which turns visibility into a smooth phenomenon. We demonstrate the advantages of our versatile scene model in several generative pose estimation problems, namely marker-less multi-object pose estimation, marker-less human motion capture with few cameras, and image-based 3D geometry estimation

    EgoCap: Egocentric Marker-less Motion Capture with Two Fisheye Cameras (Extended Abstract)

    No full text
    Marker-based and marker-less optical skeletal motion-capture methods use an outside-in arrangement of cameras placed around a scene, with viewpoints converging on the center. They often create discomfort by possibly needed marker suits, and their recording volume is severely restricted and often constrained to indoor scenes with controlled backgrounds. We therefore propose a new method for real-time, marker-less and egocentric motion capture which estimates the full-body skeleton pose from a lightweight stereo pair of fisheye cameras that are attached to a helmet or virtual-reality headset. It combines the strength of a new generative pose estimation framework for fisheye views with a ConvNet-based body-part detector trained on a new automatically annotated and augmented dataset. Our inside-in method captures full-body motion in general indoor and outdoor scenes, and also crowded scenes

    VNect: Real-time 3D Human Pose Estimation with a Single RGB Camera

    Full text link
    We present the first real-time method to capture the full global 3D skeletal pose of a human in a stable, temporally consistent manner using a single RGB camera. Our method combines a new convolutional neural network (CNN) based pose regressor with kinematic skeleton fitting. Our novel fully-convolutional pose formulation regresses 2D and 3D joint positions jointly in real time and does not require tightly cropped input frames. A real-time kinematic skeleton fitting method uses the CNN output to yield temporally stable 3D global pose reconstructions on the basis of a coherent kinematic skeleton. This makes our approach the first monocular RGB method usable in real-time applications such as 3D character control---thus far, the only monocular methods for such applications employed specialized RGB-D cameras. Our method's accuracy is quantitatively on par with the best offline 3D monocular RGB pose estimation methods. Our results are qualitatively comparable to, and sometimes better than, results from monocular RGB-D approaches, such as the Kinect. However, we show that our approach is more broadly applicable than RGB-D solutions, i.e. it works for outdoor scenes, community videos, and low quality commodity RGB cameras.Comment: Accepted to SIGGRAPH 201

    A Versatile Scene Model with Differentiable Visibility Applied to Generative Pose Estimation

    Get PDF
    Generative reconstruction methods compute the 3D configuration (such as pose and/or geometry) of a shape by optimizing the overlap of the projected 3D shape model with images. Proper handling of occlusions is a big challenge, since the visibility function that indicates if a surface point is seen from a camera can often not be formulated in closed form, and is in general discrete and non-differentiable at occlusion boundaries. We present a new scene representation that enables an analytically differentiable closed-form formulation of surface visibility. In contrast to previous methods, this yields smooth, analytically differentiable, and efficient to optimize pose similarity energies with rigorous occlusion handling, fewer local minima, and experimentally verified improved convergence of numerical optimization. The underlying idea is a new image formation model that represents opaque objects by a translucent medium with a smooth Gaussian density distribution which turns visibility into a smooth phenomenon. We demonstrate the advantages of our versatile scene model in several generative pose estimation problems, namely marker-less multi-object pose estimation, marker-less human motion capture with few cameras, and image-based 3D geometry estimation

    Neural Scene Decomposition for Multi-Person Motion Capture

    Get PDF
    Learning general image representations has proven key to the success of many computer vision tasks. For example, many approaches to image understanding problems rely on deep networks that were initially trained on ImageNet, mostly because the learned features are a valuable starting point to learn from limited labeled data. However, when it comes to 3D motion capture of multiple people, these features are only of limited use. In this paper, we therefore propose an approach to learning features that are useful for this purpose. To this end, we introduce a self-supervised approach to learning what we call a neural scene decomposition (NSD) that can be exploited for 3D pose estimation. NSD comprises three layers of abstraction to represent human subjects: spatial layout in terms of bounding-boxes and relative depth; a 2D shape representation in terms of an instance segmentation mask; and subject-specific appearance and 3D pose information. By exploiting self-supervision coming from multiview data, our NSD model can be trained end-to-end without any 2D or 3D supervision. In contrast to previous approaches, it works for multiple persons and full-frame images. Because it encodes 3D geometry, NSD can then be effectively leveraged to train a 3D pose estimation network from small amounts of annotated data. Our code and newly introduced boxing dataset is available at github.com and cvlab.epfl.ch

    3D human pose reconstruction for ergonomic posture analysis

    Get PDF
    The rapid development of the modular construction industry has produced the social concerns about workers’ health and safety in the factory-controlled construction processes. According to the reports from Association of Workers’ Compensation Boards of Canada (WCBC), approximately 2 in 100 workers are injured due to their awkward and improper postures and motions in the modular construction industry in Canada. The occurrence of injuries and accidents not only reduces the productivity but also increases the project cost. In this respect, the ergonomic posture built upon the self-report, manual observations, direct measurement or computer vision, is essential to identify, mitigate and prevent these postures for safety and productivity improvement. Advanced computer vision technologies have made the vision-based ergonomic posture analysis cost-effective in real workplaces. So far, several vision-based methods have been created to obtain the anthropometry data, such as joint coordinates and body angles, which are required for the ergonomic posture analysis. However, there are still some challenges like occlusions and lack of accuracy in complex working environments to reduce the reliability and robustness of these vision-based methods in practice. This research proposes a novel framework that acquires the body joint angles for ergonomic posture analysis by reconstructing the 3D worker body with the 2D videos recorded from a monocular camera. The framework consists of (1) human tracking in the given videos; (2) 2D body joints and body parts detection using the tracking results; (3) 2D pose refining based on integrating the 2D joints detection with the body parts detection; (4) 3D body model generation and body angle calculation; and (5) ergonomic posture analysis based on the obtained body angles. The proposed framework has been tested on the videos in real factories and the test results were compared with the motion data captured by the IMU-based suit. The results showed that the average 3D pose difference was 17.51 degrees in terms of joint angles and the lowest joint angle difference was around 4 degrees
    corecore