22,088 research outputs found
3D-Aware Neural Body Fitting for Occlusion Robust 3D Human Pose Estimation
Regression-based methods for 3D human pose estimation directly predict the 3D
pose parameters from a 2D image using deep networks. While achieving
state-of-the-art performance on standard benchmarks, their performance degrades
under occlusion. In contrast, optimization-based methods fit a parametric body
model to 2D features in an iterative manner. The localized reconstruction loss
can potentially make them robust to occlusion, but they suffer from the 2D-3D
ambiguity.
Motivated by the recent success of generative models in rigid object pose
estimation, we propose 3D-aware Neural Body Fitting (3DNBF) - an approximate
analysis-by-synthesis approach to 3D human pose estimation with SOTA
performance and occlusion robustness. In particular, we propose a generative
model of deep features based on a volumetric human representation with Gaussian
ellipsoidal kernels emitting 3D pose-dependent feature vectors. The neural
features are trained with contrastive learning to become 3D-aware and hence to
overcome the 2D-3D ambiguity.
Experiments show that 3DNBF outperforms other approaches on both occluded and
standard benchmarks. Code is available at https://github.com/edz-o/3DNBFComment: ICCV 2023, project page: https://3dnbf.github.io
Optical Non-Line-of-Sight Physics-based 3D Human Pose Estimation
We describe a method for 3D human pose estimation from transient images
(i.e., a 3D spatio-temporal histogram of photons) acquired by an optical
non-line-of-sight (NLOS) imaging system. Our method can perceive 3D human pose
by `looking around corners' through the use of light indirectly reflected by
the environment. We bring together a diverse set of technologies from NLOS
imaging, human pose estimation and deep reinforcement learning to construct an
end-to-end data processing pipeline that converts a raw stream of photon
measurements into a full 3D human pose sequence estimate. Our contributions are
the design of data representation process which includes (1) a learnable
inverse point spread function (PSF) to convert raw transient images into a deep
feature vector; (2) a neural humanoid control policy conditioned on the
transient image feature and learned from interactions with a physics simulator;
and (3) a data synthesis and augmentation strategy based on depth data that can
be transferred to a real-world NLOS imaging system. Our preliminary experiments
suggest that our method is able to generalize to real-world NLOS measurement to
estimate physically-valid 3D human poses.Comment: CVPR 2020. Video: https://youtu.be/4HFulrdmLE8. Project page:
https://marikoisogawa.github.io/project/nlos_pos
Gaze Estimation Based on Multi-view Geometric Neural Networks
Gaze and head pose estimation can play essential roles in various applications, such as human attention recognition and behavior analysis. Most of the deep neural network-based gaze estimation techniques use supervised regression techniques where features are extracted from eye images by neural networks and regress 3D gaze vectors. I plan to apply the geometric features of the eyes to determine the gaze vectors of observers relying on the concepts of 3D multiple view geometry. We develop an end to-end CNN framework for gaze estimation using 3D geometric constraints under semi-supervised and unsupervised settings and compare the results. We explore the mathematics behind the concepts of Homography and Structure-from- Motion and extend it to the gaze estimation problem using the eye region landmarks. We demonstrate the necessity of the application of 3D eye region landmarks for implementing the 3D geometry-based algorithms and address the problem when lacking the depth parameters in the gaze estimation datasets. We further explore the use of Convolutional Neural Networks (CNNs) to develop an end-to-end learning-based framework, which takes in sequential eye images to estimate the relative gaze changes of observers. We use a depth network for performing monocular image depth estimation of the eye region landmarks, which are further utilized by the pose network to estimate the relative gaze change using view synthesis constraints of the iris regions. We further explore CNN frameworks to estimate the relative changes in homography matrices between sequential eye images based on the eye region landmarks to estimate the pose of the iris and hence determine the relative change in the gaze of the observer. We compare and analyze the results obtained from mathematical calculations and deep neural network-based methods. We further compare the performance of the proposed CNN scheme with the state-of-the-art regression-based methods for gaze estimation. Future work involves extending the end-to-end pipeline as an unsupervised framework for gaze estimation in the wild
- …