96,857 research outputs found
3D Pose Estimation and 3D Model Retrieval for Objects in the Wild
We propose a scalable, efficient and accurate approach to retrieve 3D models
for objects in the wild. Our contribution is twofold. We first present a 3D
pose estimation approach for object categories which significantly outperforms
the state-of-the-art on Pascal3D+. Second, we use the estimated pose as a prior
to retrieve 3D models which accurately represent the geometry of objects in RGB
images. For this purpose, we render depth images from 3D models under our
predicted pose and match learned image descriptors of RGB images against those
of rendered depth images using a CNN-based multi-view metric learning approach.
In this way, we are the first to report quantitative results for 3D model
retrieval on Pascal3D+, where our method chooses the same models as human
annotators for 50% of the validation images on average. In addition, we show
that our method, which was trained purely on Pascal3D+, retrieves rich and
accurate 3D models from ShapeNet given RGB images of objects in the wild.Comment: Accepted to Conference on Computer Vision and Pattern Recognition
(CVPR) 201
Improving 2D-3D Dense Correspondences with Diffusion Models for 6D Object Pose Estimation
Estimating 2D-3D correspondences between RGB images and 3D space is a
fundamental problem in 6D object pose estimation. Recent pose estimators use
dense correspondence maps and Point-to-Point algorithms to estimate object
poses. The accuracy of pose estimation depends heavily on the quality of the
dense correspondence maps and their ability to withstand occlusion, clutter,
and challenging material properties. Currently, dense correspondence maps are
estimated using image-to-image translation models based on GANs, Autoencoders,
or direct regression models. However, recent advancements in image-to-image
translation have led to diffusion models being the superior choice when
evaluated on benchmarking datasets. In this study, we compare image-to-image
translation networks based on GANs and diffusion models for the downstream task
of 6D object pose estimation. Our results demonstrate that the diffusion-based
image-to-image translation model outperforms the GAN, revealing potential for
further improvements in 6D object pose estimation models.Comment: Submitted to the First Austrian Symposium on AI, Robotics, and Vision
202
VIBE: Video Inference for Human Body Pose and Shape Estimation
Human motion is fundamental to understanding behavior. Despite progress on
single-image 3D pose and shape estimation, existing video-based
state-of-the-art methods fail to produce accurate and natural motion sequences
due to a lack of ground-truth 3D motion data for training. To address this
problem, we propose Video Inference for Body Pose and Shape Estimation (VIBE),
which makes use of an existing large-scale motion capture dataset (AMASS)
together with unpaired, in-the-wild, 2D keypoint annotations. Our key novelty
is an adversarial learning framework that leverages AMASS to discriminate
between real human motions and those produced by our temporal pose and shape
regression networks. We define a temporal network architecture and show that
adversarial training, at the sequence level, produces kinematically plausible
motion sequences without in-the-wild ground-truth 3D labels. We perform
extensive experimentation to analyze the importance of motion and demonstrate
the effectiveness of VIBE on challenging 3D pose estimation datasets, achieving
state-of-the-art performance. Code and pretrained models are available at
https://github.com/mkocabas/VIBE.Comment: CVPR-2020 camera ready. Code is available at
https://github.com/mkocabas/VIB
A Novel Self-Intersection Penalty Term for Statistical Body Shape Models and Its Applications in 3D Pose Estimation
Statistical body shape models are widely used in 3D pose estimation due to
their low-dimensional parameters representation. However, it is difficult to
avoid self-intersection between body parts accurately. Motivated by this fact,
we proposed a novel self-intersection penalty term for statistical body shape
models applied in 3D pose estimation. To avoid the trouble of computing
self-intersection for complex surfaces like the body meshes, the gradient of
our proposed self-intersection penalty term is manually derived from the
perspective of geometry. First, the self-intersection penalty term is defined
as the volume of the self-intersection region. To calculate the partial
derivatives with respect to the coordinates of the vertices, we employed
detection rays to divide vertices of statistical body shape models into
different groups depending on whether the vertex is in the region of
self-intersection. Second, the partial derivatives could be easily derived by
the normal vectors of neighboring triangles of the vertices. Finally, this
penalty term could be applied in gradient-based optimization algorithms to
remove the self-intersection of triangular meshes without using any
approximation. Qualitative and quantitative evaluations were conducted to
demonstrate the effectiveness and generality of our proposed method compared
with previous approaches. The experimental results show that our proposed
penalty term can avoid self-intersection to exclude unreasonable predictions
and improves the accuracy of 3D pose estimation indirectly. Further more, the
proposed method could be employed universally in triangular mesh based 3D
reconstruction
DiffPose: Toward More Reliable 3D Pose Estimation
Monocular 3D human pose estimation is quite challenging due to the inherent
ambiguity and occlusion, which often lead to high uncertainty and
indeterminacy. On the other hand, diffusion models have recently emerged as an
effective tool for generating high-quality images from noise. Inspired by their
capability, we explore a novel pose estimation framework (DiffPose) that
formulates 3D pose estimation as a reverse diffusion process. We incorporate
novel designs into our DiffPose to facilitate the diffusion process for 3D pose
estimation: a pose-specific initialization of pose uncertainty distributions, a
Gaussian Mixture Model-based forward diffusion process, and a
context-conditioned reverse diffusion process. Our proposed DiffPose
significantly outperforms existing methods on the widely used pose estimation
benchmarks Human3.6M and MPI-INF-3DHP. Project page:
https://gongjia0208.github.io/Diffpose/.Comment: Accepted to CVPR 202
- …