464 research outputs found
Body Knowledge and Uncertainty Modeling for Monocular 3D Human Body Reconstruction
While 3D body reconstruction methods have made remarkable progress recently,
it remains difficult to acquire the sufficiently accurate and numerous 3D
supervisions required for training. In this paper, we propose \textbf{KNOWN}, a
framework that effectively utilizes body \textbf{KNOW}ledge and
u\textbf{N}certainty modeling to compensate for insufficient 3D supervisions.
KNOWN exploits a comprehensive set of generic body constraints derived from
well-established body knowledge. These generic constraints precisely and
explicitly characterize the reconstruction plausibility and enable 3D
reconstruction models to be trained without any 3D data. Moreover, existing
methods typically use images from multiple datasets during training, which can
result in data noise (\textit{e.g.}, inconsistent joint annotation) and data
imbalance (\textit{e.g.}, minority images representing unusual poses or
captured from challenging camera views). KNOWN solves these problems through a
novel probabilistic framework that models both aleatoric and epistemic
uncertainty. Aleatoric uncertainty is encoded in a robust Negative
Log-Likelihood (NLL) training loss, while epistemic uncertainty is used to
guide model refinement. Experiments demonstrate that KNOWN's body
reconstruction outperforms prior weakly-supervised approaches, particularly on
the challenging minority images.Comment: ICCV 202
End-to-end Weakly-supervised Multiple 3D Hand Mesh Reconstruction from Single Image
In this paper, we consider the challenging task of simultaneously locating
and recovering multiple hands from single 2D image. Previous studies either
focus on single hand reconstruction or solve this problem in a multi-stage way.
Moreover, the conventional two-stage pipeline firstly detects hand areas, and
then estimates 3D hand pose from each cropped patch. To reduce the
computational redundancy in preprocessing and feature extraction, we propose a
concise but efficient single-stage pipeline. Specifically, we design a
multi-head auto-encoder structure for multi-hand reconstruction, where each
head network shares the same feature map and outputs the hand center, pose and
texture, respectively. Besides, we adopt a weakly-supervised scheme to
alleviate the burden of expensive 3D real-world data annotations. To this end,
we propose a series of losses optimized by a stage-wise training scheme, where
a multi-hand dataset with 2D annotations is generated based on the publicly
available single hand datasets. In order to further improve the accuracy of the
weakly supervised model, we adopt several feature consistency constraints in
both single and multiple hand settings. Specifically, the keypoints of each
hand estimated from local features should be consistent with the re-projected
points predicted from global features. Extensive experiments on public
benchmarks including FreiHAND, HO3D, InterHand2.6M and RHD demonstrate that our
method outperforms the state-of-the-art model-based methods in both
weakly-supervised and fully-supervised manners
PHRIT: Parametric Hand Representation with Implicit Template
We propose PHRIT, a novel approach for parametric hand mesh modeling with an
implicit template that combines the advantages of both parametric meshes and
implicit representations. Our method represents deformable hand shapes using
signed distance fields (SDFs) with part-based shape priors, utilizing a
deformation field to execute the deformation. The model offers efficient
high-fidelity hand reconstruction by deforming the canonical template at
infinite resolution. Additionally, it is fully differentiable and can be easily
used in hand modeling since it can be driven by the skeleton and shape latent
codes. We evaluate PHRIT on multiple downstream tasks, including
skeleton-driven hand reconstruction, shapes from point clouds, and single-view
3D reconstruction, demonstrating that our approach achieves realistic and
immersive hand modeling with state-of-the-art performance.Comment: Accepted by ICCV202
HaMuCo: Hand Pose Estimation via Multiview Collaborative Self-Supervised Learning
Recent advancements in 3D hand pose estimation have shown promising results,
but its effectiveness has primarily relied on the availability of large-scale
annotated datasets, the creation of which is a laborious and costly process. To
alleviate the label-hungry limitation, we propose a self-supervised learning
framework, HaMuCo, that learns a single-view hand pose estimator from
multi-view pseudo 2D labels. However, one of the main challenges of
self-supervised learning is the presence of noisy labels and the ``groupthink''
effect from multiple views. To overcome these issues, we introduce a cross-view
interaction network that distills the single-view estimator by utilizing the
cross-view correlated features and enforcing multi-view consistency to achieve
collaborative learning. Both the single-view estimator and the cross-view
interaction network are trained jointly in an end-to-end manner. Extensive
experiments show that our method can achieve state-of-the-art performance on
multi-view self-supervised hand pose estimation. Furthermore, the proposed
cross-view interaction network can also be applied to hand pose estimation from
multi-view input and outperforms previous methods under the same settings.Comment: Accepted to ICCV 2023. Won first place in the HANDS22 Challenge Task
2. Project page: https://zxz267.github.io/HaMuC
ShapeGraFormer: GraFormer-Based Network for Hand-Object Reconstruction from a Single Depth Map
3D reconstruction of hand-object manipulations is important for emulating
human actions. Most methods dealing with challenging object manipulation
scenarios, focus on hands reconstruction in isolation, ignoring physical and
kinematic constraints due to object contact. Some approaches produce more
realistic results by jointly reconstructing 3D hand-object interactions.
However, they focus on coarse pose estimation or rely upon known hand and
object shapes. We propose the first approach for realistic 3D hand-object shape
and pose reconstruction from a single depth map. Unlike previous work, our
voxel-based reconstruction network regresses the vertex coordinates of a hand
and an object and reconstructs more realistic interaction. Our pipeline
additionally predicts voxelized hand-object shapes, having a one-to-one mapping
to the input voxelized depth. Thereafter, we exploit the graph nature of the
hand and object shapes, by utilizing the recent GraFormer network with
positional embedding to reconstruct shapes from template meshes. In addition,
we show the impact of adding another GraFormer component that refines the
reconstructed shapes based on the hand-object interactions and its ability to
reconstruct more accurate object shapes. We perform an extensive evaluation on
the HO-3D and DexYCB datasets and show that our method outperforms existing
approaches in hand reconstruction and produces plausible reconstructions for
the object
Identity-Aware Hand Mesh Estimation and Personalization from RGB Images
Reconstructing 3D hand meshes from monocular RGB images has attracted
increasing amount of attention due to its enormous potential applications in
the field of AR/VR. Most state-of-the-art methods attempt to tackle this task
in an anonymous manner. Specifically, the identity of the subject is ignored
even though it is practically available in real applications where the user is
unchanged in a continuous recording session. In this paper, we propose an
identity-aware hand mesh estimation model, which can incorporate the identity
information represented by the intrinsic shape parameters of the subject. We
demonstrate the importance of the identity information by comparing the
proposed identity-aware model to a baseline which treats subject anonymously.
Furthermore, to handle the use case where the test subject is unseen, we
propose a novel personalization pipeline to calibrate the intrinsic shape
parameters using only a few unlabeled RGB images of the subject. Experiments on
two large scale public datasets validate the state-of-the-art performance of
our proposed method.Comment: ECCV 2022. Github
https://github.com/deyingk/PersonalizedHandMeshEstimatio
- …