6 research outputs found
ShapeGraFormer: GraFormer-Based Network for Hand-Object Reconstruction from a Single Depth Map
3D reconstruction of hand-object manipulations is important for emulating
human actions. Most methods dealing with challenging object manipulation
scenarios, focus on hands reconstruction in isolation, ignoring physical and
kinematic constraints due to object contact. Some approaches produce more
realistic results by jointly reconstructing 3D hand-object interactions.
However, they focus on coarse pose estimation or rely upon known hand and
object shapes. We propose the first approach for realistic 3D hand-object shape
and pose reconstruction from a single depth map. Unlike previous work, our
voxel-based reconstruction network regresses the vertex coordinates of a hand
and an object and reconstructs more realistic interaction. Our pipeline
additionally predicts voxelized hand-object shapes, having a one-to-one mapping
to the input voxelized depth. Thereafter, we exploit the graph nature of the
hand and object shapes, by utilizing the recent GraFormer network with
positional embedding to reconstruct shapes from template meshes. In addition,
we show the impact of adding another GraFormer component that refines the
reconstructed shapes based on the hand-object interactions and its ability to
reconstruct more accurate object shapes. We perform an extensive evaluation on
the HO-3D and DexYCB datasets and show that our method outperforms existing
approaches in hand reconstruction and produces plausible reconstructions for
the object
A Versatile Scene Model with Differentiable Visibility Applied to Generative Pose Estimation
Generative reconstruction methods compute the 3D config-uration (such as pose and/or geometry) of a shape by op-timizing the overlap of the projected 3D shape model with images. Proper handling of occlusions is a big challenge, since the visibility function that indicates if a surface point is seen from a camera can often not be formulated in closed form, and is in general discrete and non-differentiable at oc-clusion boundaries. We present a new scene representation that enables an analytically differentiable closed-form for-mulation of surface visibility. In contrast to previous meth-ods, this yields smooth, analytically differentiable, and effi-cient to optimize pose similarity energies with rigorous oc-clusion handling, fewer local minima, and experimentally verified improved convergence of numerical optimization. The underlying idea is a new image formation model that represents opaque objects by a translucent medium with a smooth Gaussian density distribution which turns visibility into a smooth phenomenon. We demonstrate the advantages of our versatile scene model in several generative pose esti-mation problems, namely marker-less multi-object pose es-timation, marker-less human motion capture with few cam-eras, and image-based 3D geometry estimation. 1
Model-based human performance capture in outdoor scenes
Technologies for motion and performance capture of real actors have enabled the creation of realisticlooking virtual humans through detail and deformation transfer at the cost of extensive manual work and sophisticated in-studio marker-based systems. This thesis pushes the boundaries of performance capture by proposing automatic algorithms for robust 3D skeleton and detailed surface tracking in less constrained multi-view outdoor scenarios. Contributions include new multi-layered human body representations designed for effective model-based time-consistent reconstruction in complex dynamic environments with varying illumination, from a set of vision cameras. We design dense surface refinement approaches to enable smooth silhouette-free model-to-image alignment, as well as coarse-to-fine tracking techniques to enable joint estimation of skeleton motion and finescale surface deformations in complicated scenarios. High-quality results attained on challenging application scenarios confirm the contributions and show great potential for the automatic creation of personalized 3D virtual humans.Technologien zur Bewegungs- und Verhaltenserfassung echter Schauspieler haben es ermöglicht, realistisch aussehende virtuelle Menschen zu erschaffen. Diese Technologien basieren auf Detail- und Deformationstransfers, entstanden aus umfangreicher Handarbeit und im Studio entwickelter, komplexer markerbasierter Systeme. Die vorliegende Arbeit sprengt die Grenzen der Verhaltenserfassung, indem sie automatische Algorithmen fĂŒr ein robustes 3D-Skelett und detailliertes OberflĂ€chen- Tracking in weniger eingeschrĂ€nkten Outdoor-Szenarien mit Mehrfachansichten vorschlĂ€gt. Zu den BeitrĂ€gen gehören neue mehrschichtige Darstellungen des menschlichen Körpers, die fĂŒr eine effektive modellbasierte und zeitlich konstante Rekonstruktion entwickelt wurden. Diese Darstellungen wurden in komplexen dynamischen Umgebungen mit unterschiedlicher Beleuchtung aus mehreren Vision-Kameras erzeugt. Die AnsĂ€tze zur OberflĂ€chenverfeinerung einerseits ermöglichen eine ausgeglichene silhouettenfreie Ausrichtung des Modells an das Bild. Grobe bis feine Tracking-Techniken andererseits ermöglichen eine gemeinsame SchĂ€tzung von Skelettbewegungen und feinskaligen OberflĂ€chendeformationen in komplexen Szenarien. Hochwertige Ergebnisse aus anspruchsvollen Anwendungsszenarien bestĂ€tigen die BeitrĂ€ge und zeigen groĂes Potenzial fĂŒr die automatische Erstellung von personalisierten virtuellen 3D-Menschen
A Versatile Scene Model with Differentiable Visibility Applied to Generative Pose Estimation
Generative reconstruction methods compute the 3D configuration (such as pose and/or geometry) of a shape by optimizing the overlap of the projected 3D shape model with images. Proper handling of occlusions is a big challenge, since the visibility function that indicates if a surface point is seen from a camera can often not be formulated in closed form, and is in general discrete and non-differentiable at occlusion boundaries. We present a new scene representation that enables an analytically differentiable closed-form formulation of surface visibility. In contrast to previous methods, this yields smooth, analytically differentiable, and efficient to optimize pose similarity energies with rigorous occlusion handling, fewer local minima, and experimentally verified improved convergence of numerical optimization. The underlying idea is a new image formation model that represents opaque objects by a translucent medium with a smooth Gaussian density distribution which turns visibility into a smooth phenomenon. We demonstrate the advantages of our versatile scene model in several generative pose estimation problems, namely marker-less multi-object pose estimation, marker-less human motion capture with few cameras, and image-based 3D geometry estimation