338 research outputs found

    Robust and accurate skeletal rigging from mesh sequences

    Full text link

    MonoPerfCap: Human Performance Capture from Monocular Video

    Full text link
    We present the first marker-less approach for temporally coherent 3D performance capture of a human with general clothing from monocular video. Our approach reconstructs articulated human skeleton motion as well as medium-scale non-rigid surface deformations in general scenes. Human performance capture is a challenging problem due to the large range of articulation, potentially fast motion, and considerable non-rigid deformations, even from multi-view data. Reconstruction from monocular video alone is drastically more challenging, since strong occlusions and the inherent depth ambiguity lead to a highly ill-posed reconstruction problem. We tackle these challenges by a novel approach that employs sparse 2D and 3D human pose detections from a convolutional neural network using a batch-based pose estimation strategy. Joint recovery of per-batch motion allows to resolve the ambiguities of the monocular reconstruction problem based on a low dimensional trajectory subspace. In addition, we propose refinement of the surface geometry based on fully automatically extracted silhouettes to enable medium-scale non-rigid alignment. We demonstrate state-of-the-art performance capture results that enable exciting applications such as video editing and free viewpoint video, previously infeasible from monocular video. Our qualitative and quantitative evaluation demonstrates that our approach significantly outperforms previous monocular methods in terms of accuracy, robustness and scene complexity that can be handled.Comment: Accepted to ACM TOG 2018, to be presented on SIGGRAPH 201

    Automatic generation of dynamic skin deformation for animated characters

    Get PDF
    © 2018 by the authors. Since non-automatic rigging requires heavy human involvements, and various automatic rigging algorithms are less efficient in terms of computational efficiency, especially for current curve-based skin deformation methods, identifying the iso-parametric curves and creating the animation skeleton requires tedious and time-consuming manual work. Although several automatic rigging methods have been developed, but they do not aim at curve-based models. To tackle this issue, this paper proposes a new rigging algorithm for automatic generation of dynamic skin deformation to quickly identify iso-parametric curves and create an animation skeleton in a few milliseconds, which can be seamlessly used in curve-based skin deformation methods to make the rigging process fast enough for highly efficient computer animation applications

    Novel Correspondence-based Approach for Consistent Human Skeleton Extraction

    Get PDF
    This paper presents a novel base-points-driven shape correspondence (BSC) approach to extract skeletons of articulated objects from 3D mesh shapes. The skeleton extraction based on BSC approach is more accurate than the traditional direct skeleton extraction methods. Since 3D shapes provide more geometric information, BSC offers the consistent information between the source shape and the target shapes. In this paper, we first extract the skeleton from a template shape such as the source shape automatically. Then, the skeletons of the target shapes of different poses are generated based on the correspondence relationship with source shape. The accuracy of the proposed method is demonstrated by presenting a comprehensive performance evaluation on multiple benchmark datasets. The results of the proposed approach can be applied to various applications such as skeleton-driven animation, shape segmentation and human motion analysis

    LiveCap: Real-time Human Performance Capture from Monocular Video

    Full text link
    We present the first real-time human performance capture approach that reconstructs dense, space-time coherent deforming geometry of entire humans in general everyday clothing from just a single RGB video. We propose a novel two-stage analysis-by-synthesis optimization whose formulation and implementation are designed for high performance. In the first stage, a skinned template model is jointly fitted to background subtracted input video, 2D and 3D skeleton joint positions found using a deep neural network, and a set of sparse facial landmark detections. In the second stage, dense non-rigid 3D deformations of skin and even loose apparel are captured based on a novel real-time capable algorithm for non-rigid tracking using dense photometric and silhouette constraints. Our novel energy formulation leverages automatically identified material regions on the template to model the differing non-rigid deformation behavior of skin and apparel. The two resulting non-linear optimization problems per-frame are solved with specially-tailored data-parallel Gauss-Newton solvers. In order to achieve real-time performance of over 25Hz, we design a pipelined parallel architecture using the CPU and two commodity GPUs. Our method is the first real-time monocular approach for full-body performance capture. Our method yields comparable accuracy with off-line performance capture techniques, while being orders of magnitude faster

    Real-time human performance capture and synthesis

    Get PDF
    Most of the images one finds in the media, such as on the Internet or in textbooks and magazines, contain humans as the main point of attention. Thus, there is an inherent necessity for industry, society, and private persons to be able to thoroughly analyze and synthesize the human-related content in these images. One aspect of this analysis and subject of this thesis is to infer the 3D pose and surface deformation, using only visual information, which is also known as human performance capture. Human performance capture enables the tracking of virtual characters from real-world observations, and this is key for visual effects, games, VR, and AR, to name just a few application areas. However, traditional capture methods usually rely on expensive multi-view (marker-based) systems that are prohibitively expensive for the vast majority of people, or they use depth sensors, which are still not as common as single color cameras. Recently, some approaches have attempted to solve the task by assuming only a single RGB image is given. Nonetheless, they can either not track the dense deforming geometry of the human, such as the clothing layers, or they are far from real time, which is indispensable for many applications. To overcome these shortcomings, this thesis proposes two monocular human performance capture methods, which for the first time allow the real-time capture of the dense deforming geometry as well as an unseen 3D accuracy for pose and surface deformations. At the technical core, this work introduces novel GPU-based and data-parallel optimization strategies in conjunction with other algorithmic design choices that are all geared towards real-time performance at high accuracy. Moreover, this thesis presents a new weakly supervised multiview training strategy combined with a fully differentiable character representation that shows superior 3D accuracy. However, there is more to human-related Computer Vision than only the analysis of people in images. It is equally important to synthesize new images of humans in unseen poses and also from camera viewpoints that have not been observed in the real world. Such tools are essential for the movie industry because they, for example, allow the synthesis of photo-realistic virtual worlds with real-looking humans or of contents that are too dangerous for actors to perform on set. But also video conferencing and telepresence applications can benefit from photo-real 3D characters, as they can enhance the immersive experience of these applications. Here, the traditional Computer Graphics pipeline for rendering photo-realistic images involves many tedious and time-consuming steps that require expert knowledge and are far from real time. Traditional rendering involves character rigging and skinning, the modeling of the surface appearance properties, and physically based ray tracing. Recent learning-based methods attempt to simplify the traditional rendering pipeline and instead learn the rendering function from data resulting in methods that are easier accessible to non-experts. However, most of them model the synthesis task entirely in image space such that 3D consistency cannot be achieved, and/or they fail to model motion- and view-dependent appearance effects. To this end, this thesis presents a method and ongoing work on character synthesis, which allow the synthesis of controllable photoreal characters that achieve motion- and view-dependent appearance effects as well as 3D consistency and which run in real time. This is technically achieved by a novel coarse-to-fine geometric character representation for efficient synthesis, which can be solely supervised on multi-view imagery. Furthermore, this work shows how such a geometric representation can be combined with an implicit surface representation to boost synthesis and geometric quality.In den meisten Bildern in den heutigen Medien, wie dem Internet, Büchern und Magazinen, ist der Mensch das zentrale Objekt der Bildkomposition. Daher besteht eine inhärente Notwendigkeit für die Industrie, die Gesellschaft und auch für Privatpersonen, die auf den Mensch fokussierten Eigenschaften in den Bildern detailliert analysieren und auch synthetisieren zu können. Ein Teilaspekt der Anaylse von menschlichen Bilddaten und damit Bestandteil der Thesis ist das Rekonstruieren der 3D-Skelett-Pose und der Oberflächendeformation des Menschen anhand von visuellen Informationen, was fachsprachlich auch als Human Performance Capture bezeichnet wird. Solche Rekonstruktionsverfahren ermöglichen das Tracking von virtuellen Charakteren anhand von Beobachtungen in der echten Welt, was unabdingbar ist für Applikationen im Bereich der visuellen Effekte, Virtual und Augmented Reality, um nur einige Applikationsfelder zu nennen. Nichtsdestotrotz basieren traditionelle Tracking-Methoden auf teuren (markerbasierten) Multi-Kamera Systemen, welche für die Mehrheit der Bevölkerung nicht erschwinglich sind oder auf Tiefenkameras, die noch immer nicht so gebräuchlich sind wie herkömmliche Farbkameras. In den letzten Jahren gab es daher erste Methoden, die versuchen, das Tracking-Problem nur mit Hilfe einer Farbkamera zu lösen. Allerdings können diese entweder die Kleidung der Person im Bild nicht tracken oder die Methoden benötigen zu viel Rechenzeit, als dass sie in realen Applikationen genutzt werden könnten. Um diese Probleme zu lösen, stellt die Thesis zwei monokulare Human Performance Capture Methoden vor, die zum ersten Mal eine Echtzeit-Rechenleistung erreichen sowie im Vergleich zu vorherigen Arbeiten die Genauigkeit von Pose und Oberfläche in 3D weiter verbessern. Der Kern der Methoden beinhaltet eine neuartige GPU-basierte und datenparallelisierte Optimierungsstrategie, die im Zusammenspiel mit anderen algorithmischen Designentscheidungen akkurate Ergebnisse erzeugt und dabei eine Echtzeit-Laufzeit ermöglicht. Daneben wird eine neue, differenzierbare und schwach beaufsichtigte, Multi-Kamera basierte Trainingsstrategie in Kombination mit einem komplett differenzierbaren Charaktermodell vorgestellt, welches ungesehene 3D Präzision erreicht. Allerdings spielt nicht nur die Analyse von Menschen in Bildern in Computer Vision eine wichtige Rolle, sondern auch die Möglichkeit, neue Bilder von Personen in unterschiedlichen Posen und Kamera- Blickwinkeln synthetisch zu rendern, ohne dass solche Daten zuvor in der Realität aufgenommen wurden. Diese Methoden sind unabdingbar für die Filmindustrie, da sie es zum Beispiel ermöglichen, fotorealistische virtuelle Welten mit real aussehenden Menschen zu erzeugen, sowie die Möglichkeit bieten, Szenen, die für den Schauspieler zu gefährlich sind, virtuell zu produzieren, ohne dass eine reale Person diese Aktionen tatsächlich ausführen muss. Aber auch Videokonferenzen und Telepresence-Applikationen können von fotorealistischen 3D-Charakteren profitieren, da diese die immersive Erfahrung von solchen Applikationen verstärken. Traditionelle Verfahren zum Rendern von fotorealistischen Bildern involvieren viele mühsame und zeitintensive Schritte, welche Expertenwissen vorraussetzen und zudem auch Rechenzeiten erreichen, die jenseits von Echtzeit sind. Diese Schritte beinhalten das Rigging und Skinning von virtuellen Charakteren, das Modellieren von Reflektions- und Materialeigenschaften sowie physikalisch basiertes Ray Tracing. Vor Kurzem haben Deep Learning-basierte Methoden versucht, die Rendering-Funktion von Daten zu lernen, was in Verfahren resultierte, die eine Nutzung durch Nicht-Experten ermöglicht. Allerdings basieren die meisten Methoden auf Synthese-Verfahren im 2D-Bildbereich und können daher keine 3D-Konsistenz garantieren. Darüber hinaus gelingt es den meisten Methoden auch nicht, bewegungs- und blickwinkelabhängige Effekte zu erzeugen. Daher präsentiert diese Thesis eine neue Methode und eine laufende Forschungsarbeit zum Thema Charakter-Synthese, die es erlauben, fotorealistische und kontrollierbare 3D-Charakteren synthetisch zu rendern, die nicht nur 3D-konsistent sind, sondern auch bewegungs- und blickwinkelabhängige Effekte modellieren und Echtzeit-Rechenzeiten ermöglichen. Dazu wird eine neuartige Grobzu- Fein-Charakterrepräsentation für effiziente Bild-Synthese von Menschen vorgestellt, welche nur anhand von Multi-Kamera-Daten trainiert werden kann. Daneben wird gezeigt, wie diese explizite Geometrie- Repräsentation mit einer impliziten Oberflächendarstellung kombiniert werden kann, was eine bessere Synthese von geomtrischen Deformationen sowie Bildern ermöglicht.ERC Consolidator Grant 4DRepL
    • …
    corecore