Search CORE

1,375 research outputs found

It's all Relative: Monocular 3D Human Pose Estimation from Weakly Supervised Data

Author: Eng Robert
Mac Aodha Oisin
Perona Pietro
Ronchi Matteo Ruggero
Publication venue
Publication date: 17/05/2018
Field of study

We address the problem of 3D human pose estimation from 2D input images using only weakly supervised training data. Despite showing considerable success for 2D pose estimation, the application of supervised machine learning to 3D pose estimation in real world images is currently hampered by the lack of varied training images with corresponding 3D poses. Most existing 3D pose estimation algorithms train on data that has either been collected in carefully controlled studio settings or has been generated synthetically. Instead, we take a different approach, and propose a 3D human pose estimation algorithm that only requires relative estimates of depth at training time. Such training signal, although noisy, can be easily collected from crowd annotators, and is of sufficient quality for enabling successful training and evaluation of 3D pose algorithms. Our results are competitive with fully supervised regression based approaches on the Human3.6M dataset, despite using significantly weaker training data. Our proposed algorithm opens the door to using existing widespread 2D datasets for 3D pose estimation by allowing fine-tuning with noisy relative constraints, resulting in more accurate 3D poses.Comment: BMVC 2018. Project page available at http://www.vision.caltech.edu/~mronchi/projects/RelativePos

arXiv.org e-Print Archive

Caltech Authors

In the Wild Human Pose Estimation Using Explicit 2D Features and Intermediate 3D Representations

Author: Habibie Ikhsanul
Mehta Dushyant
Pons-Moll Gerard
Theobalt Christian
Xu Weipeng
Publication venue
Publication date: 01/01/2019
Field of study

Convolutional Neural Network based approaches for monocular 3D human pose estimation usually require a large amount of training images with 3D pose annotations. While it is feasible to provide 2D joint annotations for large corpora of in-the-wild images with humans, providing accurate 3D annotations to such in-the-wild corpora is hardly feasible in practice. Most existing 3D labelled data sets are either synthetically created or feature in-studio images. 3D pose estimation algorithms trained on such data often have limited ability to generalize to real world scene diversity. We therefore propose a new deep learning based method for monocular 3D human pose estimation that shows high accuracy and generalizes better to in-the-wild scenes. It has a network architecture that comprises a new disentangled hidden space encoding of explicit 2D and 3D features, and uses supervision by a new learned projection model from predicted 3D pose. Our algorithm can be jointly trained on image data with 3D labels and image data with only 2D labels. It achieves state-of-the-art accuracy on challenging in-the-wild data.Comment: Accepted to CVPR 201

arXiv.org e-Print Archive

In the Wild Human Pose Estimation Using Explicit 2D Features and Intermediate 3D Representations

Author: Habibie I.
Mehta D.
Pons-Moll G.
Theobalt C.
Xu W.
Publication venue
Publication date: 01/01/2019
Field of study

Adversarial 3D Human Pose Estimation via Multimodal Depth Supervision

Author: Cai Jinmiao
Han Xiaoguang
Jia Kui
Jiang Nianjuan
Li Yao
Lu Jiangbo
Shi Yulong
Zhou Kun
Publication venue
Publication date: 20/09/2018
Field of study

In this paper, a novel deep-learning based framework is proposed to infer 3D human poses from a single image. Specifically, a two-phase approach is developed. We firstly utilize a generator with two branches for the extraction of explicit and implicit depth information respectively. During the training process, an adversarial scheme is also employed to further improve the performance. The implicit and explicit depth information with the estimated 2D joints generated by a widely used estimator, in the second step, are together fed into a deep 3D pose regressor for the final pose generation. Our method achieves MPJPE of 58.68mm on the ECCV2018 3D Human Pose Estimation Challenge

arXiv.org e-Print Archive

DenseBody: Directly Regressing Dense 3D Human Pose and Shape From a Single Color Image

Author: Fang Zheng
Feng Yao
Li Jiwei
Wu Fan
Yao Pengfei
Publication venue
Publication date: 28/03/2019
Field of study

Recovering 3D human body shape and pose from 2D images is a challenging task due to high complexity and flexibility of human body, and relatively less 3D labeled data. Previous methods addressing these issues typically rely on predicting intermediate results such as body part segmentation, 2D/3D joints, silhouette mask to decompose the problem into multiple sub-tasks in order to utilize more 2D labels. Most previous works incorporated parametric body shape model in their methods and predict parameters in low-dimensional space to represent human body. In this paper, we propose to directly regress the 3D human mesh from a single color image using Convolutional Neural Network(CNN). We use an efficient representation of 3D human shape and pose which can be predicted through an encoder-decoder neural network. The proposed method achieves state-of-the-art performance on several 3D human body datasets including Human3.6M, SURREAL and UP-3D with even faster running speed.Comment: 10 pages, 6 figure

arXiv.org e-Print Archive

Not All Parts Are Created Equal: 3D Pose Estimation by Modelling Bi-directional Dependencies of Body Parts

Author: Huang Shaoli
Tao Dacheng
Wang Jue
Wang Xinchao
Publication venue
Publication date: 20/05/2019
Field of study

Not all the human body parts have the same~degree of freedom~(DOF) due to the physiological structure. For example, the limbs may move more flexibly and freely than the torso does. Most of the existing 3D pose estimation methods, despite the very promising results achieved, treat the body joints equally and consequently often lead to larger reconstruction errors on the limbs. In this paper, we propose a progressive approach that explicitly accounts for the distinct DOFs among the body parts. We model parts with higher DOFs like the elbows, as dependent components of the corresponding parts with lower DOFs like the torso, of which the 3D locations can be more reliably estimated. Meanwhile, the high-DOF parts may, in turn, impose a constraint on where the low-DOF ones lie. As a result, parts with different DOFs supervise one another, yielding physically constrained and plausible pose-estimation results. To further facilitate the prediction of the high-DOF parts, we introduce a pose-attribute estimation, where the relative location of a limb joint with respect to the torso, which has the least DOF of a human body, is explicitly estimated and further fed to the joint-estimation module. The proposed approach achieves very promising results, outperforming the state of the art on several benchmarks

arXiv.org e-Print Archive

DF-Net: Unsupervised Joint Learning of Depth and Flow using Cross-Task Consistency

Author: Huang Jia-Bin
Luo Zelun
Zou Yuliang
Publication venue
Publication date: 05/09/2018
Field of study

We present an unsupervised learning framework for simultaneously training single-view depth prediction and optical flow estimation models using unlabeled video sequences. Existing unsupervised methods often exploit brightness constancy and spatial smoothness priors to train depth or flow models. In this paper, we propose to leverage geometric consistency as additional supervisory signals. Our core idea is that for rigid regions we can use the predicted scene depth and camera motion to synthesize 2D optical flow by backprojecting the induced 3D scene flow. The discrepancy between the rigid flow (from depth prediction and camera motion) and the estimated flow (from optical flow model) allows us to impose a cross-task consistency loss. While all the networks are jointly optimized during training, they can be applied independently at test time. Extensive experiments demonstrate that our depth and flow models compare favorably with state-of-the-art unsupervised methods.Comment: ECCV 2018. Project website: http://yuliang.vision/DF-Net/ Code: https://github.com/vt-vl-lab/DF-Ne

arXiv.org e-Print Archive

Taskonomy: Disentangling Task Transfer Learning

Author: Guibas Leonidas
Malik Jitendra
Savarese Silvio
Sax Alexander
Shen William
Zamir Amir
Publication venue
Publication date: 23/04/2018
Field of study

Do visual tasks have a relationship, or are they unrelated? For instance, could having surface normals simplify estimating the depth of an image? Intuition answers these questions positively, implying existence of a structure among visual tasks. Knowing this structure has notable values; it is the concept underlying transfer learning and provides a principled way for identifying redundancies across tasks, e.g., to seamlessly reuse supervision among related tasks or solve many tasks in one system without piling up the complexity. We proposes a fully computational approach for modeling the structure of space of visual tasks. This is done via finding (first and higher-order) transfer learning dependencies across a dictionary of twenty six 2D, 2.5D, 3D, and semantic tasks in a latent space. The product is a computational taxonomic map for task transfer learning. We study the consequences of this structure, e.g. nontrivial emerged relationships, and exploit them to reduce the demand for labeled data. For example, we show that the total number of labeled datapoints needed for solving a set of 10 tasks can be reduced by roughly 2/3 (compared to training independently) while keeping the performance nearly the same. We provide a set of tools for computing and probing this taxonomical structure including a solver that users can employ to devise efficient supervision policies for their use cases.Comment: CVPR 2018 (Oral). See project website and live demos at http://taskonomy.vision

arXiv.org e-Print Archive

Patch-based 3D Human Pose Refinement

Author: Qiu Weichao
Wan Qingfu
Yuille Alan L.
Publication venue
Publication date: 20/05/2019
Field of study

State-of-the-art 3D human pose estimation approaches typically estimate pose from the entire RGB image in a single forward run. In this paper, we develop a post-processing step to refine 3D human pose estimation from body part patches. Using local patches as input has two advantages. First, the fine details around body parts are zoomed in to high resolution for preciser 3D pose prediction. Second, it enables the part appearance to be shared between poses to benefit rare poses. In order to acquire informative representation of patches, we explore different input modalities and validate the superiority of fusing predicted segmentation with RGB. We show that our method consistently boosts the accuracy of state-of-the-art 3D human pose methods.Comment: Accepted by CVPR 2019 Augmented Human: Human-centric Understanding and 2D/3D Synthesis, and the third Look Into Person (LIP) Challenge Worksho

arXiv.org e-Print Archive

Out of the Box: A combined approach for handling occlusion in Human Pose Estimation

Author: Jena Rohit
Publication venue
Publication date: 25/04/2019
Field of study

Human Pose estimation is a challenging problem, especially in the case of 3D pose estimation from 2D images due to many different factors like occlusion, depth ambiguities, intertwining of people, and in general crowds. 2D multi-person human pose estimation in the wild also suffers from the same problems - occlusion, ambiguities, and disentanglement of people's body parts. Being a fundamental problem with loads of applications, including but not limited to surveillance, economical motion capture for video games and movies, and physiotherapy, this is an interesting problem to be solved both from a practical perspective and from an intellectual perspective as well. Although there are cases where no pose estimation can ever predict with 100% accuracy (cases where even humans would fail), there are several algorithms that have brought new state-of-the-art performance in human pose estimation in the wild. We look at a few algorithms with different approaches and also formulate our own approach to tackle a consistently bugging problem, i.e. occlusions.Comment: 11 pages, 12 figure

arXiv.org e-Print Archive