64 research outputs found
Representation learning of vertex heatmaps for 3D human mesh reconstruction from multi-view images
This study addresses the problem of 3D human mesh reconstruction from
multi-view images. Recently, approaches that directly estimate the skinned
multi-person linear model (SMPL)-based human mesh vertices based on volumetric
heatmap representation from input images have shown good performance. We show
that representation learning of vertex heatmaps using an autoencoder helps
improve the performance of such approaches. Vertex heatmap autoencoder (VHA)
learns the manifold of plausible human meshes in the form of latent codes using
AMASS, which is a large-scale motion capture dataset. Body code predictor (BCP)
utilizes the learned body prior from VHA for human mesh reconstruction from
multi-view images through latent code-based supervision and transfer of
pretrained weights. According to experiments on Human3.6M and LightStage
datasets, the proposed method outperforms previous methods and achieves
state-of-the-art human mesh reconstruction performance.Comment: ICIP 202
Human Pose Estimation from Ambiguous Pressure Recordings with Spatio-temporal Masked Transformers
Despite the impressive performance of vision-based pose estimators, they
generally fail to perform well under adverse vision conditions and often don't
satisfy the privacy demands of customers. As a result, researchers have begun
to study tactile sensing systems as an alternative. However, these systems
suffer from noisy and ambiguous recordings. To tackle this problem, we propose
a novel solution for pose estimation from ambiguous pressure data. Our method
comprises a spatio-temporal vision transformer with an encoder-decoder
architecture. Detailed experiments on two popular public datasets reveal that
our model outperforms existing solutions in the area. Moreover, we observe that
increasing the number of temporal crops in the early stages of the network
positively impacts the performance while pre-training the network in a
self-supervised setting using a masked auto-encoder approach also further
improves the results
UCLID-Net: Single View Reconstruction in Object Space
Most state-of-the-art deep geometric learning single-view reconstruction
approaches rely on encoder-decoder architectures that output either shape
parametrizations or implicit representations. However, these representations
rarely preserve the Euclidean structure of the 3D space objects exist in. In
this paper, we show that building a geometry preserving 3-dimensional latent
space helps the network concurrently learn global shape regularities and local
reasoning in the object coordinate space and, as a result, boosts performance.
We demonstrate both on ShapeNet synthetic images, which are often used for
benchmarking purposes, and on real-world images that our approach outperforms
state-of-the-art ones. Furthermore, the single-view pipeline naturally extends
to multi-view reconstruction, which we also show.Comment: Added supplementary materia
- …