99 research outputs found
Deep Autoencoder for Combined Human Pose Estimation and body Model Upscaling
We present a method for simultaneously estimating 3D human pose and body
shape from a sparse set of wide-baseline camera views. We train a symmetric
convolutional autoencoder with a dual loss that enforces learning of a latent
representation that encodes skeletal joint positions, and at the same time
learns a deep representation of volumetric body shape. We harness the latter to
up-scale input volumetric data by a factor of , whilst recovering a
3D estimate of joint positions with equal or greater accuracy than the state of
the art. Inference runs in real-time (25 fps) and has the potential for passive
human behaviour monitoring where there is a requirement for high fidelity
estimation of human body shape and pose
Disentangling Human Dynamics for Pedestrian Locomotion Forecasting with Noisy Supervision
We tackle the problem of Human Locomotion Forecasting, a task for jointly
predicting the spatial positions of several keypoints on the human body in the
near future under an egocentric setting. In contrast to the previous work that
aims to solve either the task of pose prediction or trajectory forecasting in
isolation, we propose a framework to unify the two problems and address the
practically useful task of pedestrian locomotion prediction in the wild. Among
the major challenges in solving this task is the scarcity of annotated
egocentric video datasets with dense annotations for pose, depth, or egomotion.
To surmount this difficulty, we use state-of-the-art models to generate (noisy)
annotations and propose robust forecasting models that can learn from this
noisy supervision. We present a method to disentangle the overall pedestrian
motion into easier to learn subparts by utilizing a pose completion and a
decomposition module. The completion module fills in the missing key-point
annotations and the decomposition module breaks the cleaned locomotion down to
global (trajectory) and local (pose keypoint movements). Further, with Quasi
RNN as our backbone, we propose a novel hierarchical trajectory forecasting
network that utilizes low-level vision domain specific signals like egomotion
and depth to predict the global trajectory. Our method leads to
state-of-the-art results for the prediction of human locomotion in the
egocentric view. Project pade: https://karttikeya.github.io/publication/plf/Comment: Accepted to WACV 2020 (Oral
Learning to Reconstruct People in Clothing from a Single RGB Camera
We present a learning-based model to infer the personalized 3D shape of people from a few frames (1-8) of a monocular video in which the person is moving, in less than 10 seconds with a reconstruction accuracy of 5mm. Our model learns to predict the parameters of a statistical body model and instance displacements that add clothing and hair to the shape. The model achieves fast and accurate predictions based on two key design choices. First, by predicting shape in a canonical T-pose space, the network learns to encode the images of the person into pose-invariant latent codes, where the information is fused. Second, based on the observation that feed-forward predictions are fast but do not always align with the input images, we predict using both, bottom-up and top-down streams (one per view) allowing information to flow in both directions. Learning relies only on synthetic 3D data. Once learned, the model can take a variable number of frames as input, and is able to reconstruct shapes even from a single image with an accuracy of 6mm. Results on 3 different datasets demonstrate the efficacy and accuracy of our approach
Accurate 6D Object Pose Estimation by Pose Conditioned Mesh Reconstruction
Current 6D object pose methods consist of deep CNN models fully optimized for
a single object but with its architecture standardized among objects with
different shapes. In contrast to previous works, we explicitly exploit each
object's distinct topological information i.e. 3D dense meshes in the pose
estimation model, with an automated process and prior to any post-processing
refinement stage. In order to achieve this, we propose a learning framework in
which a Graph Convolutional Neural Network reconstructs a pose conditioned 3D
mesh of the object. A robust estimation of the allocentric orientation is
recovered by computing, in a differentiable manner, the Procrustes' alignment
between the canonical and reconstructed dense 3D meshes. 6D egocentric pose is
then lifted using additional mask and 2D centroid projection estimations. Our
method is capable of self validating its pose estimation by measuring the
quality of the reconstructed mesh, which is invaluable in real life
applications. In our experiments on the LINEMOD, OCCLUSION and YCB-Video
benchmarks, the proposed method outperforms state-of-the-arts
- …