Search CORE

772 research outputs found

MonoPerfCap: Human Performance Capture from Monocular Video

Author: Chatterjee Avishek
Mehta Dushyant
Rhodin Helge
Seidel Hans-Peter
Theobalt Christian
Xu Weipeng
Zollhöfer Michael
Publication venue
Publication date: 01/01/2018
Field of study

We present the first marker-less approach for temporally coherent 3D performance capture of a human with general clothing from monocular video. Our approach reconstructs articulated human skeleton motion as well as medium-scale non-rigid surface deformations in general scenes. Human performance capture is a challenging problem due to the large range of articulation, potentially fast motion, and considerable non-rigid deformations, even from multi-view data. Reconstruction from monocular video alone is drastically more challenging, since strong occlusions and the inherent depth ambiguity lead to a highly ill-posed reconstruction problem. We tackle these challenges by a novel approach that employs sparse 2D and 3D human pose detections from a convolutional neural network using a batch-based pose estimation strategy. Joint recovery of per-batch motion allows to resolve the ambiguities of the monocular reconstruction problem based on a low dimensional trajectory subspace. In addition, we propose refinement of the surface geometry based on fully automatically extracted silhouettes to enable medium-scale non-rigid alignment. We demonstrate state-of-the-art performance capture results that enable exciting applications such as video editing and free viewpoint video, previously infeasible from monocular video. Our qualitative and quantitative evaluation demonstrates that our approach significantly outperforms previous monocular methods in terms of accuracy, robustness and scene complexity that can be handled.Comment: Accepted to ACM TOG 2018, to be presented on SIGGRAPH 201

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

MPG.PuRe

A framework for realistic 3D tele-immersion

Author: Alexiadis D.
Alexiadis D.
Broeck S.V.
Broeck S.V.
Cesar P.
Cesar P.
Daras P.
Daras P.
Eisert P.
Eisert P.
Fechteler P.
Fechteler P.
Hilsmann A.
Hilsmann A.
Kuijk F.
Kuijk F.
Mauro D.A.
Mauro D.A.
Mekuria R.
Mekuria R.
Monaghan D.
Monaghan D.
O'Connor N.E.
O'Connor N.E.
Sanna M.
Sanna M.
Stevens C.
Stevens C.
Wall J.
Wall J.
Zahariadis T.
Zahariadis T.
Publication venue
Publication date: 01/01/2013
Field of study

Meeting, socializing and conversing online with a group of people using teleconferencing systems is still quite differ- ent from the experience of meeting face to face. We are abruptly aware that we are online and that the people we are engaging with are not in close proximity. Analogous to how talking on the telephone does not replicate the experi- ence of talking in person. Several causes for these differences have been identified and we propose inspiring and innova- tive solutions to these hurdles in attempt to provide a more realistic, believable and engaging online conversational expe- rience. We present the distributed and scalable framework REVERIE that provides a balanced mix of these solutions. Applications build on top of the REVERIE framework will be able to provide interactive, immersive, photo-realistic ex- periences to a multitude of users that for them will feel much more similar to having face to face meetings than the expe- rience offered by conventional teleconferencing systems

UEL Research Repository at University of East London

Crossref

CWI's Institutional Repository

Fraunhofer-ePrints

Irish Universities

DCU Online Research Access Service

Video based Animation Synthesis with the Essential Graph

Author: Boukhayma Adnane
Boyer Edmond
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 19/10/2015
Field of study

International audienceWe propose a method to generate animations using video-based mesh sequences of elementary movements of a shape. New motions that satisfy high-level user-specified constraints are built by recombining and interpolating the frames in the observed mesh sequences. The interest of video based meshes is to provide real full shape information and to enable therefore realistic shape animations. A resulting issue lies, however, in the difficulty to combine and interpolate human poses without a parametric pose model, as with skeleton based animations. To address this issue, our method brings two innovations that contribute at different levels: Locally between two motion sequences, we introduce a new approach to generate realistic transitions using dynamic time warping; More globally, over a set of motion sequences, we propose the essential graph as an efficient structure to encode the most realistic transitions between all pairs of input shape poses. Graph search in the essential graph allows then to generate realistic motions that are optimal with respect to various user-defined constraints. We present both quantitative and qualitative results on various 3D video datasets. They show that our approach compares favourably with previous strategies in this field that use the motion graph

Crossref

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server

Recommended from our members

LEARNING TO RIG CHARACTERS

Author: Xu Zhan
Publication venue: ScholarWorks@UMass Amherst
Publication date: 08/08/2023
Field of study

With the emergence of 3D virtual worlds, 3D social media, and massive online games, the need for diverse, high-quality, animation-ready characters and avatars is greater than ever. To animate characters, artists hand-craft articulation structures, such as animation skeletons and part deformers, which require significant amount of manual and laborious interaction with 2D/3D modeling interfaces. This thesis presents deep learning methods that are able to significantly automate the process of character rigging. First, the thesis introduces RigNet, a method capable of predicting an animation skeleton for an input static 3D shape in the form of a polygon mesh. The predicted skeletons match the animator expectations in joint placement and topology. RigNet also estimates surface skin weights which determine how the mesh is animated given the different skeletal poses. In contrast to prior work that fits pre-defined skeletal templates with hand-tuned objectives, RigNet is able to automatically rig diverse characters, such as humanoids, quadrupeds, toys, birds, with varying articulation structure and geometry. RigNet is based on a deep neural architecture that directly operates on the mesh representation. The architecture is trained on a diverse dataset of rigged models that we mined online and curated. The dataset includes 2.7K polygon meshes, along with their associated skeletons and corresponding skin weights. Second, the thesis introduces Morig, a method that automatically rigs character meshes driven by single-view point cloud streams capturing the motion of performing characters. Compared to RigNet, MoRig\u27s rigging is \emph{motion-aware}: its neural network encodes motion cues from the point clouds into compact feature representations that are informative about the articulated parts of the performing character. These motion-aware features guide the inference of an appropriate skeletal rig for the input mesh. Furthermore, Morig is able to animate the rig according to the captured point cloud motion. Morig can handle diverse characters with different morphologies (e.g., humanoids, quadrupeds, toy characters). It also accounts for occluded regions in the point clouds and mismatches in the part proportions between the input mesh and captured character. Third, the thesis introduces APES, a method that takes as input 2D raster images depicting a small set of poses of a character shown in a sprite sheet, and identifies articulated parts useful for rigging the character. APES uses a combination of neural network inference and integer linear programming to identify a compact set of articulated body parts, e.g. head, torso and limbs, that best reconstruct the input poses. Compared to Morig and RigNet that require a large collection of training models with associated skeletons and skinning weights, APES\u27 neural architecture relies on less effortful supervision from (i) pixel correspondences readily available in existing large cartoon image datasets (e.g., Creative Flow), (ii) a relatively small dataset of 57 cartoon characters segmented into moving parts. Finally, the thesis discusses future research directions related to combining neural rigging with 3D and 4D reconstruction of characters from point cloud data and 2D video as well as automating the process of motion synthesis for 3D characters

ScholarWorks@UMass Amherst

Template based shape processing

Author: Stoll Carsten
Publication venue
Publication date: 06/05/2010
Field of study

As computers can only represent and process discrete data, information gathered from the real world always has to be sampled. While it is nowadays possible to sample many signals accurately and thus generate high-quality reconstructions (for example of images and audio data), accurately and densely sampling 3D geometry is still a challenge. The signal samples may be corrupted by noise and outliers, and contain large holes due to occlusions. These issues become even more pronounced when also considering the temporal domain. Because of this, developing methods for accurate reconstruction of shapes from a sparse set of discrete data is an important aspect of the computer graphics processing pipeline. In this thesis we propose novel approaches to including semantic knowledge into reconstruction processes using template based shape processing. We formulate shape reconstruction as a deformable template fitting process, where we try to fit a given template model to the sampled data. This approach allows us to present novel solutions to several fundamental problems in the area of shape reconstruction. We address static problems like constrained texture mapping and semantically meaningful hole-filling in surface reconstruction from 3D scans, temporal problems such as mesh based performance capture, and finally dynamic problems like the estimation of physically based material parameters of animated templates.Analoge Signale müssen digitalisiert werden um sie auf modernen Computern speichern und verarbeiten zu können. Für viele Signale, wie zum Beispiel Bilder oder Tondaten, existieren heutzutage effektive und effiziente Digitalisierungstechniken. Aus den so gewonnenen Daten können die ursprünglichen Signale hinreichend akkurat wiederhergestellt werden. Im Gegensatz dazu stellt das präzise und effiziente Digitalisieren und Rekonstruieren von 3D- oder gar 4D-Geometrie immer noch eine Herausforderung dar. So führen Verdeckungen und Fehler während der Digitalisierung zu Löchern und verrauschten Meßdaten. Die Erforschung von akkuraten Rekonstruktionsmethoden für diese groben digitalen Daten ist daher ein entscheidender Schritt in der Entwicklung moderner Verarbeitungsmethoden in der Computergrafik. In dieser Dissertation wird veranschaulicht, wie deformierbare geometrische Modelle als Vorlage genutzt werden können, um semantische Informationen in die robuste Rekonstruktion von 3D- und 4D Geometrie einfließen zu lassen. Dadurch wird es möglich, neue Lösungsansätze für mehrere grundlegenden Probleme der Computergrafik zu entwickeln. So können mit dieser Technik Löcher in digitalisierten 3D Modellen semantisch sinnvoll aufgefüllt, oder detailgetreue virtuelle Kopien von Darstellern und ihrer dynamischen Kleidung zu erzeugt werden

Acronym

Robust Temporally Coherent Laplacian Protrusion Segmentation of 3D Articulated Bodies

Author: Cuzzolin Fabio
Horaud Radu
Mateus Diana
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 26/05/2014
Field of study

In motion analysis and understanding it is important to be able to fit a suitable model or structure to the temporal series of observed data, in order to describe motion patterns in a compact way, and to discriminate between them. In an unsupervised context, i.e., no prior model of the moving object(s) is available, such a structure has to be learned from the data in a bottom-up fashion. In recent times, volumetric approaches in which the motion is captured from a number of cameras and a voxel-set representation of the body is built from the camera views, have gained ground due to attractive features such as inherent view-invariance and robustness to occlusions. Automatic, unsupervised segmentation of moving bodies along entire sequences, in a temporally-coherent and robust way, has the potential to provide a means of constructing a bottom-up model of the moving body, and track motion cues that may be later exploited for motion classification. Spectral methods such as locally linear embedding (LLE) can be useful in this context, as they preserve "protrusions", i.e., high-curvature regions of the 3D volume, of articulated shapes, while improving their separation in a lower dimensional space, making them in this way easier to cluster. In this paper we therefore propose a spectral approach to unsupervised and temporally-coherent body-protrusion segmentation along time sequences. Volumetric shapes are clustered in an embedding space, clusters are propagated in time to ensure coherence, and merged or split to accommodate changes in the body's topology. Experiments on both synthetic and real sequences of dense voxel-set data are shown. This supports the ability of the proposed method to cluster body-parts consistently over time in a totally unsupervised fashion, its robustness to sampling density and shape quality, and its potential for bottom-up model constructionComment: 31 pages, 26 figure

arXiv.org e-Print Archive

Crossref

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server

HAL-Rennes 1

A Framework for Realistic 3D Tele-Immersion

Author: Broeck S V
Cesar P
Eisert P
Fechteler P
Hilsmann A
Kuijk F
Mauro Davide Andrea
Mekuria R
Sanna M
Stevens C
Wall J
Publication venue: Marshall Digital Scholar
Publication date: 01/06/2013
Field of study

Meeting, socializing and conversing online with a group of people using teleconferencing systems is still quite different from the experience of meeting face to face. We are abruptly aware that we are online and that the people we are engaging with are not in close proximity. Analogous to how talking on the telephone does not replicate the experience of talking in person. Several causes for these differences have been identified and we propose inspiring and innovative solutions to these hurdles in attempt to provide a more realistic, believable and engaging online conversational experience. We present the distributed and scalable framework REVERIE that provides a balanced mix of these solutions. Applications build on top of the REVERIE framework will be able to provide interactive, immersive, photo-realistic experiences to a multitude of users that for them will feel much more similar to having face to face meetings than the experience offered by conventional teleconferencing systems

Marshall University

A Framework for Realistic 3D Tele-Immersion

Author: Alexiadis D
Broeck S. van
César Garcia P.S. (Pablo Santiago)
Daras P. (Petros)
Eisert P.
Fechteler P
Hilsmann A.
Kuijk A.A.M. (Fons)
Mauro D.
Mekuria R.N. (Rufael)
Monaghan D.
O'Connor N. (Noel)
Sanna M. (Michele)
Stevens C.
Wall J.
Zahariadis T.
Publication venue: 'American College of Medical Physics (ACMP)'
Publication date: 01/01/2013
Field of study

Meeting, socializing and conversing online with a group of people using teleconferencing systems is still quite different from the experience of meeting face to face. We are abruptly aware that we are online and that the people we are engaging with are not in close proximity. Analogous to how talking on the telephone does not replicate the experience of talking in person. Several causes for these differences have been identied and we propose inspiring and innovative solutions to these hurdles in attempt to provide a more realistic, believable and engaging online conversational experience. We present the distributed and scalable framework REVERIE that provides a balanced mix of these solutions. Applications build on top of the REVERIE framework will be able to provide interactive, immersive, photo-realistic experiences to a multitude of users that for them will feel much more similar to having face to face meetings than the experience offered by conventional teleconferencing systems

CWI's Institutional Repository