559 research outputs found
Mirror-Aware Neural Humans
Human motion capture either requires multi-camera systems or is unreliable
using single-view input due to depth ambiguities. Meanwhile, mirrors are
readily available in urban environments and form an affordable alternative by
recording two views with only a single camera. However, the mirror setting
poses the additional challenge of handling occlusions of real and mirror image.
Going beyond existing mirror approaches for 3D human pose estimation, we
utilize mirrors for learning a complete body model, including shape and dense
appearance. Our main contributions are extending articulated neural radiance
fields to include a notion of a mirror, making it sample-efficient over
potential occlusion regions. Together, our contributions realize a
consumer-level 3D motion capture system that starts from off-the-shelf 2D poses
by automatically calibrating the camera, estimating mirror orientation, and
subsequently lifting 2D keypoint detections to 3D skeleton pose that is used to
condition the mirror-aware NeRF. We empirically demonstrate the benefit of
learning a body model and accounting for occlusion in challenging mirror
scenes.Comment: Project website:
https://danielajisafe.github.io/mirror-aware-neural-humans
Body Knowledge and Uncertainty Modeling for Monocular 3D Human Body Reconstruction
While 3D body reconstruction methods have made remarkable progress recently,
it remains difficult to acquire the sufficiently accurate and numerous 3D
supervisions required for training. In this paper, we propose \textbf{KNOWN}, a
framework that effectively utilizes body \textbf{KNOW}ledge and
u\textbf{N}certainty modeling to compensate for insufficient 3D supervisions.
KNOWN exploits a comprehensive set of generic body constraints derived from
well-established body knowledge. These generic constraints precisely and
explicitly characterize the reconstruction plausibility and enable 3D
reconstruction models to be trained without any 3D data. Moreover, existing
methods typically use images from multiple datasets during training, which can
result in data noise (\textit{e.g.}, inconsistent joint annotation) and data
imbalance (\textit{e.g.}, minority images representing unusual poses or
captured from challenging camera views). KNOWN solves these problems through a
novel probabilistic framework that models both aleatoric and epistemic
uncertainty. Aleatoric uncertainty is encoded in a robust Negative
Log-Likelihood (NLL) training loss, while epistemic uncertainty is used to
guide model refinement. Experiments demonstrate that KNOWN's body
reconstruction outperforms prior weakly-supervised approaches, particularly on
the challenging minority images.Comment: ICCV 202
{HandFlow}: {Q}uantifying View-Dependent {3D} Ambiguity in Two-Hand Reconstruction with Normalizing Flow
Reconstructing two-hand interactions from a single image is a challengingproblem due to ambiguities that stem from projective geometry and heavyocclusions. Existing methods are designed to estimate only a single pose,despite the fact that there exist other valid reconstructions that fit theimage evidence equally well. In this paper we propose to address this issue byexplicitly modeling the distribution of plausible reconstructions in aconditional normalizing flow framework. This allows us to directly supervisethe posterior distribution through a novel determinant magnituderegularization, which is key to varied 3D hand pose samples that project wellinto the input image. We also demonstrate that metrics commonly used to assessreconstruction quality are insufficient to evaluate pose predictions under suchsevere ambiguity. To address this, we release the first dataset with multipleplausible annotations per image called MultiHands. The additional annotationsenable us to evaluate the estimated distribution using the maximum meandiscrepancy metric. Through this, we demonstrate the quality of ourprobabilistic reconstruction and show that explicit ambiguity modeling isbetter-suited for this challenging problem.<br
HandFlow: Quantifying View-Dependent 3D Ambiguity in Two-Hand Reconstruction with Normalizing Flow
Reconstructing two-hand interactions from a single image is a challenging
problem due to ambiguities that stem from projective geometry and heavy
occlusions. Existing methods are designed to estimate only a single pose,
despite the fact that there exist other valid reconstructions that fit the
image evidence equally well. In this paper we propose to address this issue by
explicitly modeling the distribution of plausible reconstructions in a
conditional normalizing flow framework. This allows us to directly supervise
the posterior distribution through a novel determinant magnitude
regularization, which is key to varied 3D hand pose samples that project well
into the input image. We also demonstrate that metrics commonly used to assess
reconstruction quality are insufficient to evaluate pose predictions under such
severe ambiguity. To address this, we release the first dataset with multiple
plausible annotations per image called MultiHands. The additional annotations
enable us to evaluate the estimated distribution using the maximum mean
discrepancy metric. Through this, we demonstrate the quality of our
probabilistic reconstruction and show that explicit ambiguity modeling is
better-suited for this challenging problem.Comment: VMV 2022 - Symposium on Vision, Modeling, and Visualizatio
MHR-Net: Multiple-Hypothesis Reconstruction of Non-Rigid Shapes from 2D Views
We propose MHR-Net, a novel method for recovering Non-Rigid Shapes from
Motion (NRSfM). MHR-Net aims to find a set of reasonable reconstructions for a
2D view, and it also selects the most likely reconstruction from the set. To
deal with the challenging unsupervised generation of non-rigid shapes, we
develop a new Deterministic Basis and Stochastic Deformation scheme in MHR-Net.
The non-rigid shape is first expressed as the sum of a coarse shape basis and a
flexible shape deformation, then multiple hypotheses are generated with
uncertainty modeling of the deformation part. MHR-Net is optimized with
reprojection loss on the basis and the best hypothesis. Furthermore, we design
a new Procrustean Residual Loss, which reduces the rigid rotations between
similar shapes and further improves the performance. Experiments show that
MHR-Net achieves state-of-the-art reconstruction accuracy on Human3.6M, SURREAL
and 300-VW datasets.Comment: Accepted to ECCV 202
AI-generated Content for Various Data Modalities: A Survey
AI-generated content (AIGC) methods aim to produce text, images, videos, 3D
assets, and other media using AI algorithms. Due to its wide range of
applications and the demonstrated potential of recent works, AIGC developments
have been attracting lots of attention recently, and AIGC methods have been
developed for various data modalities, such as image, video, text, 3D shape (as
voxels, point clouds, meshes, and neural implicit fields), 3D scene, 3D human
avatar (body and head), 3D motion, and audio -- each presenting different
characteristics and challenges. Furthermore, there have also been many
significant developments in cross-modality AIGC methods, where generative
methods can receive conditioning input in one modality and produce outputs in
another. Examples include going from various modalities to image, video, 3D
shape, 3D scene, 3D avatar (body and head), 3D motion (skeleton and avatar),
and audio modalities. In this paper, we provide a comprehensive review of AIGC
methods across different data modalities, including both single-modality and
cross-modality methods, highlighting the various challenges, representative
works, and recent technical directions in each setting. We also survey the
representative datasets throughout the modalities, and present comparative
results for various modalities. Moreover, we also discuss the challenges and
potential future research directions
POCO: 3D Pose and Shape Estimation with Confidence
The regression of 3D Human Pose and Shape (HPS) from an image is becoming
increasingly accurate. This makes the results useful for downstream tasks like
human action recognition or 3D graphics. Yet, no regressor is perfect, and
accuracy can be affected by ambiguous image evidence or by poses and appearance
that are unseen during training. Most current HPS regressors, however, do not
report the confidence of their outputs, meaning that downstream tasks cannot
differentiate accurate estimates from inaccurate ones. To address this, we
develop POCO, a novel framework for training HPS regressors to estimate not
only a 3D human body, but also their confidence, in a single feed-forward pass.
Specifically, POCO estimates both the 3D body pose and a per-sample variance.
The key idea is to introduce a Dual Conditioning Strategy (DCS) for regressing
uncertainty that is highly correlated to pose reconstruction quality. The POCO
framework can be applied to any HPS regressor and here we evaluate it by
modifying HMR, PARE, and CLIFF. In all cases, training the network to reason
about uncertainty helps it learn to more accurately estimate 3D pose. While
this was not our goal, the improvement is modest but consistent. Our main
motivation is to provide uncertainty estimates for downstream tasks; we
demonstrate this in two ways: (1) We use the confidence estimates to bootstrap
HPS training. Given unlabelled image data, we take the confident estimates of a
POCO-trained regressor as pseudo ground truth. Retraining with this
automatically-curated data improves accuracy. (2) We exploit uncertainty in
video pose estimation by automatically identifying uncertain frames (e.g. due
to occlusion) and inpainting these from confident frames. Code and models will
be available for research at https://poco.is.tue.mpg.de
- …