54 research outputs found
From motion capture to interactive virtual worlds : towards unconstrained motion-capture algorithms for real-time performance-driven character animation
This dissertation takes performance-driven character animation as a representative application and advances motion capture algorithms and animation methods to meet its high demands. Existing approaches have either coarse resolution and restricted capture volume, require expensive and complex multi-camera systems, or use intrusive suits and controllers.
For motion capture, set-up time is reduced using fewer cameras, accuracy is increased despite occlusions and general environments, initialization is automated, and free roaming is enabled by egocentric cameras. For animation, increased robustness enables the use of low-cost sensors input, custom control gesture definition is guided to support novice users, and animation expressiveness is increased. The important contributions are: 1) an analytic and differentiable visibility model for pose optimization under strong occlusions, 2) a volumetric contour model for automatic actor initialization in general scenes, 3) a method to annotate and augment image-pose databases automatically, 4) the utilization of unlabeled examples for character control, and 5) the generalization and disambiguation of cyclical gestures for faithful character animation.
In summary, the whole process of human motion capture, processing, and application to animation is advanced. These advances on the state of the art have the potential to improve many interactive applications, within and outside virtual reality.Diese Arbeit befasst sich mit Performance-driven Character Animation, insbesondere werden Motion Capture-Algorithmen entwickelt um den hohen Anforderungen dieser Beispielanwendung gerecht zu werden. Existierende Methoden haben entweder eine geringe Genauigkeit und einen eingeschränkten Aufnahmebereich oder benötigen teure Multi-Kamera-Systeme, oder benutzen störende Controller und spezielle Anzüge.
Für Motion Capture wird die Setup-Zeit verkürzt, die Genauigkeit für Verdeckungen und generelle Umgebungen erhöht, die Initialisierung automatisiert, und Bewegungseinschränkung verringert. Für Character Animation wird die Robustheit für ungenaue Sensoren erhöht, Hilfe für benutzerdefinierte Gestendefinition geboten, und die Ausdrucksstärke der Animation verbessert. Die wichtigsten Beiträge sind: 1) ein analytisches und differenzierbares Sichtbarkeitsmodell für Rekonstruktionen unter starken Verdeckungen, 2) ein volumetrisches Konturenmodell für automatische Körpermodellinitialisierung in genereller Umgebung, 3) eine Methode zur automatischen Annotation von Posen und Augmentation von Bildern in großen Datenbanken, 4) das Nutzen von Beispielbewegungen für Character Animation, und 5) die Generalisierung und Übertragung von zyklischen Gesten für genaue Charakteranimation.
Es wird der gesamte Prozess erweitert, von Motion Capture bis hin zu Charakteranimation. Die Verbesserungen sind für viele interaktive Anwendungen geeignet, innerhalb und außerhalb von virtueller Realität
From motion capture to interactive virtual worlds : towards unconstrained motion-capture algorithms for real-time performance-driven character animation
This dissertation takes performance-driven character animation as a representative application and advances motion capture algorithms and animation methods to meet its high demands. Existing approaches have either coarse resolution and restricted capture volume, require expensive and complex multi-camera systems, or use intrusive suits and controllers.
For motion capture, set-up time is reduced using fewer cameras, accuracy is increased despite occlusions and general environments, initialization is automated, and free roaming is enabled by egocentric cameras. For animation, increased robustness enables the use of low-cost sensors input, custom control gesture definition is guided to support novice users, and animation expressiveness is increased. The important contributions are: 1) an analytic and differentiable visibility model for pose optimization under strong occlusions, 2) a volumetric contour model for automatic actor initialization in general scenes, 3) a method to annotate and augment image-pose databases automatically, 4) the utilization of unlabeled examples for character control, and 5) the generalization and disambiguation of cyclical gestures for faithful character animation.
In summary, the whole process of human motion capture, processing, and application to animation is advanced. These advances on the state of the art have the potential to improve many interactive applications, within and outside virtual reality.Diese Arbeit befasst sich mit Performance-driven Character Animation, insbesondere werden Motion Capture-Algorithmen entwickelt um den hohen Anforderungen dieser Beispielanwendung gerecht zu werden. Existierende Methoden haben entweder eine geringe Genauigkeit und einen eingeschränkten Aufnahmebereich oder benötigen teure Multi-Kamera-Systeme, oder benutzen störende Controller und spezielle Anzüge.
Für Motion Capture wird die Setup-Zeit verkürzt, die Genauigkeit für Verdeckungen und generelle Umgebungen erhöht, die Initialisierung automatisiert, und Bewegungseinschränkung verringert. Für Character Animation wird die Robustheit für ungenaue Sensoren erhöht, Hilfe für benutzerdefinierte Gestendefinition geboten, und die Ausdrucksstärke der Animation verbessert. Die wichtigsten Beiträge sind: 1) ein analytisches und differenzierbares Sichtbarkeitsmodell für Rekonstruktionen unter starken Verdeckungen, 2) ein volumetrisches Konturenmodell für automatische Körpermodellinitialisierung in genereller Umgebung, 3) eine Methode zur automatischen Annotation von Posen und Augmentation von Bildern in großen Datenbanken, 4) das Nutzen von Beispielbewegungen für Character Animation, und 5) die Generalisierung und Übertragung von zyklischen Gesten für genaue Charakteranimation.
Es wird der gesamte Prozess erweitert, von Motion Capture bis hin zu Charakteranimation. Die Verbesserungen sind für viele interaktive Anwendungen geeignet, innerhalb und außerhalb von virtueller Realität
Real-Time Hand Tracking Using a Sum of Anisotropic Gaussians Model
Real-time marker-less hand tracking is of increasing importance in
human-computer interaction. Robust and accurate tracking of arbitrary hand
motion is a challenging problem due to the many degrees of freedom, frequent
self-occlusions, fast motions, and uniform skin color. In this paper, we
propose a new approach that tracks the full skeleton motion of the hand from
multiple RGB cameras in real-time. The main contributions include a new
generative tracking method which employs an implicit hand shape representation
based on Sum of Anisotropic Gaussians (SAG), and a pose fitting energy that is
smooth and analytically differentiable making fast gradient based pose
optimization possible. This shape representation, together with a full
perspective projection model, enables more accurate hand modeling than a
related baseline method from literature. Our method achieves better accuracy
than previous methods and runs at 25 fps. We show these improvements both
qualitatively and quantitatively on publicly available datasets.Comment: 8 pages, Accepted version of paper published at 3DV 201
GANSeg: Learning to Segment by Unsupervised Hierarchical Image Generation
Segmenting an image into its parts is a frequent preprocess for high-level
vision tasks such as image editing. However, annotating masks for supervised
training is expensive. Weakly-supervised and unsupervised methods exist, but
they depend on the comparison of pairs of images, such as from multi-views,
frames of videos, and image augmentation, which limits their applicability. To
address this, we propose a GAN-based approach that generates images conditioned
on latent masks, thereby alleviating full or weak annotations required in
previous approaches. We show that such mask-conditioned image generation can be
learned faithfully when conditioning the masks in a hierarchical manner on
latent keypoints that define the position of parts explicitly. Without
requiring supervision of masks or points, this strategy increases robustness to
viewpoint and object positions changes. It also lets us generate image-mask
pairs for training a segmentation network, which outperforms the
state-of-the-art unsupervised segmentation methods on established benchmarks.Comment: CVPR 202
Pose Modulated Avatars from Video
It is now possible to reconstruct dynamic human motion and shape from a
sparse set of cameras using Neural Radiance Fields (NeRF) driven by an
underlying skeleton. However, a challenge remains to model the deformation of
cloth and skin in relation to skeleton pose. Unlike existing avatar models that
are learned implicitly or rely on a proxy surface, our approach is motivated by
the observation that different poses necessitate unique frequency assignments.
Neglecting this distinction yields noisy artifacts in smooth areas or blurs
fine-grained texture and shape details in sharp regions. We develop a
two-branch neural network that is adaptive and explicit in the frequency
domain. The first branch is a graph neural network that models correlations
among body parts locally, taking skeleton pose as input. The second branch
combines these correlation features to a set of global frequencies and then
modulates the feature encoding. Our experiments demonstrate that our network
outperforms state-of-the-art methods in terms of preserving details and
generalization capabilities
LatentKeypointGAN: Controlling GANs via Latent Keypoints
Generative adversarial networks (GANs) have attained photo-realistic quality
in image generation. However, how to best control the image content remains an
open challenge. We introduce LatentKeypointGAN, a two-stage GAN which is
trained end-to-end on the classical GAN objective with internal conditioning on
a set of space keypoints. These keypoints have associated appearance embeddings
that respectively control the position and style of the generated objects and
their parts. A major difficulty that we address with suitable network
architectures and training schemes is disentangling the image into spatial and
appearance factors without domain knowledge and supervision signals. We
demonstrate that LatentKeypointGAN provides an interpretable latent space that
can be used to re-arrange the generated images by re-positioning and exchanging
keypoint embeddings, such as generating portraits by combining the eyes, nose,
and mouth from different images. In addition, the explicit generation of
keypoints and matching images enables a new, GAN-based method for unsupervised
keypoint detection
NPC: Neural Point Characters from Video
High-fidelity human 3D models can now be learned directly from videos,
typically by combining a template-based surface model with neural
representations. However, obtaining a template surface requires expensive
multi-view capture systems, laser scans, or strictly controlled conditions.
Previous methods avoid using a template but rely on a costly or ill-posed
mapping from observation to canonical space. We propose a hybrid point-based
representation for reconstructing animatable characters that does not require
an explicit surface model, while being generalizable to novel poses. For a
given video, our method automatically produces an explicit set of 3D points
representing approximate canonical geometry, and learns an articulated
deformation model that produces pose-dependent point transformations. The
points serve both as a scaffold for high-frequency neural features and an
anchor for efficiently mapping between observation and canonical space. We
demonstrate on established benchmarks that our representation overcomes
limitations of prior work operating in either canonical or in observation
space. Moreover, our automatic point extraction approach enables learning
models of human and animal characters alike, matching the performance of the
methods using rigged surface templates despite being more general. Project
website: https://lemonatsu.github.io/npc/Comment: Project website: https://lemonatsu.github.io/npc
- …