114 research outputs found

    Expressive Body Capture: 3D Hands, Face, and Body from a Single Image

    Full text link
    To facilitate the analysis of human actions, interactions and emotions, we compute a 3D model of human body pose, hand pose, and facial expression from a single monocular image. To achieve this, we use thousands of 3D scans to train a new, unified, 3D model of the human body, SMPL-X, that extends SMPL with fully articulated hands and an expressive face. Learning to regress the parameters of SMPL-X directly from images is challenging without paired images and 3D ground truth. Consequently, we follow the approach of SMPLify, which estimates 2D features and then optimizes model parameters to fit the features. We improve on SMPLify in several significant ways: (1) we detect 2D features corresponding to the face, hands, and feet and fit the full SMPL-X model to these; (2) we train a new neural network pose prior using a large MoCap dataset; (3) we define a new interpenetration penalty that is both fast and accurate; (4) we automatically detect gender and the appropriate body models (male, female, or neutral); (5) our PyTorch implementation achieves a speedup of more than 8x over Chumpy. We use the new method, SMPLify-X, to fit SMPL-X to both controlled images and images in the wild. We evaluate 3D accuracy on a new curated dataset comprising 100 images with pseudo ground-truth. This is a step towards automatic expressive human capture from monocular RGB data. The models, code, and data are available for research purposes at https://smpl-x.is.tue.mpg.de.Comment: To appear in CVPR 201

    Learning an Intrinsic Garment Space for Interactive Authoring of Garment Animation

    Get PDF
    Authoring dynamic garment shapes for character animation on body motion is one of the fundamental steps in the CG industry. Established workflows are either time and labor consuming (i.e., manual editing on dense frames with controllers), or lack keyframe-level control (i.e., physically-based simulation). Not surprisingly, garment authoring remains a bottleneck in many production pipelines. Instead, we present a deep-learning-based approach for semi-automatic authoring of garment animation, wherein the user provides the desired garment shape in a selection of keyframes, while our system infers a latent representation for its motion-independent intrinsic parameters (e.g., gravity, cloth materials, etc.). Given new character motions, the latent representation allows to automatically generate a plausible garment animation at interactive rates. Having factored out character motion, the learned intrinsic garment space enables smooth transition between keyframes on a new motion sequence. Technically, we learn an intrinsic garment space with an motion-driven autoencoder network, where the encoder maps the garment shapes to the intrinsic space under the condition of body motions, while the decoder acts as a differentiable simulator to generate garment shapes according to changes in character body motion and intrinsic parameters. We evaluate our approach qualitatively and quantitatively on common garment types. Experiments demonstrate our system can significantly improve current garment authoring workflows via an interactive user interface. Compared with the standard CG pipeline, our system significantly reduces the ratio of required keyframes from 20% to 1 -- 2%

    Scalable Real-Time Rendering for Extremely Complex 3D Environments Using Multiple GPUs

    Get PDF
    In 3D visualization, real-time rendering of high-quality meshes in complex 3D environments is still one of the major challenges in computer graphics. New data acquisition techniques like 3D modeling and scanning have drastically increased the requirement for more complex models and the demand for higher display resolutions in recent years. Most of the existing acceleration techniques using a single GPU for rendering suffer from the limited GPU memory budget, the time-consuming sequential executions, and the finite display resolution. Recently, people have started building commodity workstations with multiple GPUs and multiple displays. As a result, more GPU memory is available across a distributed cluster of GPUs, more computational power is provided throughout the combination of multiple GPUs, and a higher display resolution can be achieved by connecting each GPU to a display monitor (resulting in a tiled large display configuration). However, using a multi-GPU workstation may not always give the desired rendering performance due to the imbalanced rendering workloads among GPUs and overheads caused by inter-GPU communication. In this dissertation, I contribute a multi-GPU multi-display parallel rendering approach for complex 3D environments. The approach has the capability to support a high-performance and high-quality rendering of static and dynamic 3D environments. A novel parallel load balancing algorithm is developed based on a screen partitioning strategy to dynamically balance the number of vertices and triangles rendered by each GPU. The overhead of inter-GPU communication is minimized by transferring only a small amount of image pixels rather than chunks of 3D primitives with a novel frame exchanging algorithm. The state-of-the-art parallel mesh simplification and GPU out-of-core techniques are integrated into the multi-GPU multi-display system to accelerate the rendering process

    A 3D Modeling System based on Polar Meshes

    Get PDF
    E' stato progettato un sistema di modellazione che unisce i vantaggi di SQM con i punti di forza dei tradizionali software 3D, riconoscendo la necessità di due distinti livelli operativi: ad alto livello verranno sfruttate le proprietà delle mesh polari per definire la struttura di base dell'oggetto, mentre a basso livello il designer sarà libero di aggiungere particolari alla mesh come in ogni altro software 3D. E’ stato inoltre sviluppato un motore di animazione e un L-Syste

    Sparse Volumetric Deformation

    Get PDF
    Volume rendering is becoming increasingly popular as applications require realistic solid shape representations with seamless texture mapping and accurate filtering. However rendering sparse volumetric data is difficult because of the limited memory and processing capabilities of current hardware. To address these limitations, the volumetric information can be stored at progressive resolutions in the hierarchical branches of a tree structure, and sampled according to the region of interest. This means that only a partial region of the full dataset is processed, and therefore massive volumetric scenes can be rendered efficiently. The problem with this approach is that it currently only supports static scenes. This is because it is difficult to accurately deform massive amounts of volume elements and reconstruct the scene hierarchy in real-time. Another problem is that deformation operations distort the shape where more than one volume element tries to occupy the same location, and similarly gaps occur where deformation stretches the elements further than one discrete location. It is also challenging to efficiently support sophisticated deformations at hierarchical resolutions, such as character skinning or physically based animation. These types of deformation are expensive and require a control structure (for example a cage or skeleton) that maps to a set of features to accelerate the deformation process. The problems with this technique are that the varying volume hierarchy reflects different feature sizes, and manipulating the features at the original resolution is too expensive; therefore the control structure must also hierarchically capture features according to the varying volumetric resolution. This thesis investigates the area of deforming and rendering massive amounts of dynamic volumetric content. The proposed approach efficiently deforms hierarchical volume elements without introducing artifacts and supports both ray casting and rasterization renderers. This enables light transport to be modeled both accurately and efficiently with applications in the fields of real-time rendering and computer animation. Sophisticated volumetric deformation, including character animation, is also supported in real-time. This is achieved by automatically generating a control skeleton which is mapped to the varying feature resolution of the volume hierarchy. The output deformations are demonstrated in massive dynamic volumetric scenes

    HOSNeRF: Dynamic Human-Object-Scene Neural Radiance Fields from a Single Video

    Full text link
    We introduce HOSNeRF, a novel 360{\deg} free-viewpoint rendering method that reconstructs neural radiance fields for dynamic human-object-scene from a single monocular in-the-wild video. Our method enables pausing the video at any frame and rendering all scene details (dynamic humans, objects, and backgrounds) from arbitrary viewpoints. The first challenge in this task is the complex object motions in human-object interactions, which we tackle by introducing the new object bones into the conventional human skeleton hierarchy to effectively estimate large object deformations in our dynamic human-object model. The second challenge is that humans interact with different objects at different times, for which we introduce two new learnable object state embeddings that can be used as conditions for learning our human-object representation and scene representation, respectively. Extensive experiments show that HOSNeRF significantly outperforms SOTA approaches on two challenging datasets by a large margin of 40% ~ 50% in terms of LPIPS. The code, data, and compelling examples of 360{\deg} free-viewpoint renderings from single videos will be released in https://showlab.github.io/HOSNeRF.Comment: Project page: https://showlab.github.io/HOSNeR

    Application of 3D human pose estimation for motion capture and character animation

    Get PDF
    Abstract. Interest in motion capture (mocap) technology is growing every day, and the number of possible applications is multiplying. But such systems are very expensive and are not affordable for personal use. Based on that, this thesis presents the framework that can produce mocap data from regular RGB video and then use it to animate a 3D character according to the movement of the person in the original video. To extract the mocap data from the input video, one of the three 3D pose estimation (PE) methods that are available within the scope of the project is used to determine where the joints of the person in each video frame are located in the 3D space. The 3D positions of the joints are used as mocap data and are imported to Blender which contains a simple 3D character. The data is assigned to the corresponding joints of the character to animate it. To test how the created animation will be working in a different environment, it was imported to the Unity game engine and applied to the native 3D character. The evaluation of the produced animations from Blender and Unity showed that even though the quality of the animation might be not perfect, the test subjects found this approach to animation promising. In addition, during the evaluation, a few issues were discovered and considered for future framework development

    Old stones’ song—second verse: use-wear analysis of rhyolite and fenetized andesite artifacts from the Oldowan lithic industry of Kanjera South, Kenya

    Get PDF
    This paper investigates Oldowan hominin behavioral ecology through use-wear analysis of artifacts from Kanjera South, Western Kenya. It extends development of our experimental use-wear reference collection and analysis of use-wear on the well preserved and unweathered Oldowan tools from this site to include rhyolite, a non-local material of similar durability to previously studied quartz and quartzite tools, and fenetized andesite, a local material with considerably less durability. Variability in rhyolite and fenetized andesite texture, inclusions, and matrix required enhancement of previous methods so we combine the use of stereoscopic, metallographic, and scanning electron microscopy in this study. This study allows us to begin exploration of the links between specific artifactual raw materials and the materials they were used to process. Data assembled so far suggest that tools fashioned from non-local and local stone were, with one possible exception, used to process similar materials. Additionally, experiments carried out with replicas of tools made of rhyolite and fenetized andesite confirm interpretation of reduction sequences that tools made of less durable local material had a shorter use-life and were used expediently compared to the more durable non-local quartz, quartzite, and rhyolite. These new data improve our understanding, of the functional needs, behavioral solutions, and cognitive capacities of Oldowan hominins. Finally, these data show how use-wear analysis, combined with lithic raw material and lithic technology, can be a powerful means for evaluating two key points for human evolution: long-term memory, and planning

    컴퓨터를 활용한 여러 사람의 동작 연출

    Get PDF
    학위논문 (박사)-- 서울대학교 대학원 공과대학 전기·컴퓨터공학부, 2017. 8. 이제희.Choreographing motion is the process of converting written stories or messages into the real movement of actors. In performances or movie, directors spend a consid-erable time and effort because it is the primary factor that audiences concentrate. If multiple actors exist in the scene, choreography becomes more challenging. The fundamental difficulty is that the coordination between actors should precisely be ad-justed. Spatio-temporal coordination is the first requirement that must be satisfied, and causality/mood are also another important coordinations. Directors use several assistant tools such as storyboards or roughly crafted 3D animations, which can visu-alize the flow of movements, to organize ideas or to explain them to actors. However, it is difficult to use the tools because artistry and considerable training effort are required. It also doesnt have ability to give any suggestions or feedbacks. Finally, the amount of manual labor increases exponentially as the number of actor increases. In this thesis, we propose computational approaches on choreographing multiple actor motion. The ultimate goal is to enable novice users easily to generate motions of multiple actors without substantial effort. We first show an approach to generate motions for shadow theatre, where actors should carefully collaborate to achieve the same goal. The results are comparable to ones that are made by professional ac-tors. In the next, we present an interactive animation system for pre-visualization, where users exploits an intuitive graphical interface for scene description. Given a de-scription, the system can generate motions for the characters in the scene that match the description. Finally, we propose two controller designs (combining regression with trajectory optimization, evolutionary deep reinforcement learning) for physically sim-ulated actors, which guarantee physical validity of the resultant motions.Chapter 1 Introduction 1 Chapter 2 Background 8 2.1 Motion Generation Technique 9 2.1.1 Motion Editing and Synthesis for Single-Character 9 2.1.2 Motion Editing and Synthesis for Multi-Character 9 2.1.3 Motion Planning 10 2.1.4 Motion Control by Reinforcement Learning 11 2.1.5 Pose/Motion Estimation from Incomplete Information 11 2.1.6 Diversity on Resultant Motions 12 2.2 Authoring System 12 2.2.1 System using High-level Input 12 2.2.2 User-interactive System 13 2.3 Shadow Theatre 14 2.3.1 Shadow Generation 14 2.3.2 Shadow for Artistic Purpose 14 2.3.3 Viewing Shadow Theatre as Collages/Mosaics of People 15 2.4 Physics-based Controller Design 15 2.4.1 Controllers for Various Characters 15 2.4.2 Trajectory Optimization 15 2.4.3 Sampling-based Optimization 16 2.4.4 Model-Based Controller Design 16 2.4.5 Direct Policy Learning 17 2.4.6 Deep Reinforcement Learning for Control 17 Chapter 3 Motion Generation for Shadow Theatre 19 3.1 Overview 19 3.2 Shadow Theatre Problem 21 3.2.1 Problem Definition 21 3.2.2 Approaches of Professional Actors 22 3.3 Discovery of Principal Poses 24 3.3.1 Optimization Formulation 24 3.3.2 Optimization Algorithm 27 3.4 Animating Principal Poses 29 3.4.1 Initial Configuration 29 3.4.2 Optimization for Motion Generation 30 3.5 Experimental Results 32 3.5.1 Implementation Details 33 3.5.2 Animation 34 3.5.3 3D Fabrication 34 3.6 Discussion 37 Chapter 4 Interactive Animation System for Pre-visualization 40 4.1 Overview 40 4.2 Graphical Scene Description 42 4.3 Candidate Scene Generation 45 4.3.1 Connecting Paths 47 4.3.2 Motion Cascade 47 4.3.3 Motion Selection For Each Cycle 49 4.3.4 Cycle Ordering 51 4.3.5 Generalized Paths and Cycles 52 4.3.6 Motion Editing 54 4.4 Scene Ranking 54 4.4.1 Ranking Criteria 54 4.4.2 Scene Ranking Measures 57 4.5 Scene Refinement 58 4.6 Experimental Results 62 4.7 Discussion 65 Chapter 5 Physics-based Design and Control 69 5.1 Overview 69 5.2 Combining Regression with Trajectory Optimization 70 5.2.1 Simulation and Motor Skills 71 5.2.2 Control Adaptation 75 5.2.3 Control Parameterization 79 5.2.4 Efficient Construction 81 5.2.5 Experimental Results 84 5.2.6 Discussion 89 5.3 Example-Guided Control by Deep Reinforcement Learning 91 5.3.1 System Overview 92 5.3.2 Initial Policy Construction 95 5.3.3 Evolutionary Deep Q-Learning 100 5.3.4 Experimental Results 107 5.3.5 Discussion 114 Chapter 6 Conclusion 119 6.1 Contribution 119 6.2 Future Work 120 요약 135Docto

    Physics-based Reconstruction and Animation of Humans

    Get PDF
    Creating digital representations of humans is of utmost importance for applications ranging from entertainment (video games, movies) to human-computer interaction and even psychiatrical treatments. What makes building credible digital doubles difficult is the fact that the human vision system is very sensitive to perceiving the complex expressivity and potential anomalies in body structures and motion. This thesis will present several projects that tackle these problems from two different perspectives: lightweight acquisition and physics-based simulation. It starts by describing a complete pipeline that allows users to reconstruct fully rigged 3D facial avatars using video data coming from a handheld device (e.g., smartphone). The avatars use a novel two-scale representation composed of blendshapes and dynamic detail maps. They are constructed through an optimization that integrates feature tracking, optical flow, and shape from shading. Continuing along the lines of accessible acquisition systems, we discuss a framework for simultaneous tracking and modeling of articulated human bodies from RGB-D data. We show how semantic information can be extracted from the scanned body shapes. In the second half of the thesis, we will deviate from using standard linear reconstruction and animation models, and rather focus on exploiting physics-based techniques that are able to incorporate complex phenomena such as dynamics, collision response and incompressibility of the materials. The first approach we propose assumes that each 3D scan of an actor records his body in a physical steady state and uses a process called inverse physics to extract a volumetric physics-ready anatomical model of him. By using biologically-inspired growth models for the bones, muscles and fat, our method can obtain realistic anatomical reconstructions that can be later on animated using external tracking data such as the one resulting from tracking motion capture markers. This is then extended to a novel physics-based approach for facial reconstruction and animation. We propose a facial animation model which simulates biomechanical muscle contractions in a volumetric head model in order to create the facial expressions seen in the input scans. We then show how this approach allows for new avenues of dynamic artistic control, simulation of corrective facial surgery, and interaction with external forces and objects
    corecore