4 research outputs found
Cascaded Pyramid Network for 3D Human Pose Estimation Challenge
Over the past decade, there has been a growing interest in human pose
estimation. Although much work has been done on 2D pose estimation, 3D pose
estimation has still been relatively studied less. In this paper, we propose a
top-bottom based two-stage 3D estimation framework. GloabalNet and RefineNet in
our 2D pose estimation process enable us to find occluded or invisible 2D
joints while 2D-to-3D pose estimator composed of residual blocks is used to
lift 2D joints to 3D joints effectively. The proposed method achieves promising
results with mean per joint position error at 42.39 on the validation dataset
on `3D Human Pose Estimation within the ECCV 2018 PoseTrack Challenge.'Comment: Accepted to ECCV Workshop 201
Weakly-Supervised Discovery of Geometry-Aware Representation for 3D Human Pose Estimation
Recent studies have shown remarkable advances in 3D human pose estimation
from monocular images, with the help of large-scale in-door 3D datasets and
sophisticated network architectures. However, the generalizability to different
environments remains an elusive goal. In this work, we propose a geometry-aware
3D representation for the human pose to address this limitation by using
multiple views in a simple auto-encoder model at the training stage and only 2D
keypoint information as supervision. A view synthesis framework is proposed to
learn the shared 3D representation between viewpoints with synthesizing the
human pose from one viewpoint to the other one. Instead of performing a direct
transfer in the raw image-level, we propose a skeleton-based encoder-decoder
mechanism to distil only pose-related representation in the latent space. A
learning-based representation consistency constraint is further introduced to
facilitate the robustness of latent 3D representation. Since the learnt
representation encodes 3D geometry information, mapping it to 3D pose will be
much easier than conventional frameworks that use an image or 2D coordinates as
the input of 3D pose estimator. We demonstrate our approach on the task of 3D
human pose estimation. Comprehensive experiments on three popular benchmarks
show that our model can significantly improve the performance of
state-of-the-art methods with simply injecting the representation as a robust
3D prior.Comment: Accepted as a CVPR 2019 oral paper. Project page:
https://kwanyeelin.github.io
Interaction Relational Network for Mutual Action Recognition
Person-person mutual action recognition (also referred to as interaction
recognition) is an important research branch of human activity analysis.
Current solutions in the field -- mainly dominated by CNNs, GCNs and LSTMs --
often consist of complicated architectures and mechanisms to embed the
relationships between the two persons on the architecture itself, to ensure the
interaction patterns can be properly learned. Our main contribution with this
work is by proposing a simpler yet very powerful architecture, named
Interaction Relational Network, which utilizes minimal prior knowledge about
the structure of the human body. We drive the network to identify by itself how
to relate the body parts from the individuals interacting. In order to better
represent the interaction, we define two different relationships, leading to
specialized architectures and models for each. These multiple relationship
models will then be fused into a single and special architecture, in order to
leverage both streams of information for further enhancing the relational
reasoning capability. Furthermore we define important structured pair-wise
operations to extract meaningful extra information from each pair of joints --
distance and motion. Ultimately, with the coupling of an LSTM, our IRN is
capable of paramount sequential relational reasoning. These important
extensions we made to our network can also be valuable to other problems that
require sophisticated relational reasoning. Our solution is able to achieve
state-of-the-art performance on the traditional interaction recognition
datasets SBU and UT, and also on the mutual actions from the large-scale
dataset NTU RGB+D. Furthermore, it obtains competitive performance in the NTU
RGB+D 120 dataset interactions subset.Comment: 12 pages, 6 figures, to be published in IEEE TM
PoseNet3D: Learning Temporally Consistent 3D Human Pose via Knowledge Distillation
Recovering 3D human pose from 2D joints is a highly unconstrained problem. We
propose a novel neural network framework, PoseNet3D, that takes 2D joints as
input and outputs 3D skeletons and SMPL body model parameters. By casting our
learning approach in a student-teacher framework, we avoid using any 3D data
such as paired/unpaired 3D data, motion capture sequences, depth images or
multi-view images during training. We first train a teacher network that
outputs 3D skeletons, using only 2D poses for training. The teacher network
distills its knowledge to a student network that predicts 3D pose in SMPL
representation. Finally, both the teacher and the student networks are jointly
fine-tuned in an end-to-end manner using temporal, self-consistency and
adversarial losses, improving the accuracy of each individual network. Results
on Human3.6M dataset for 3D human pose estimation demonstrate that our approach
reduces the 3D joint prediction error by 18% compared to previous unsupervised
methods. Qualitative results on in-the-wild datasets show that the recovered 3D
poses and meshes are natural, realistic, and flow smoothly over consecutive
frames.Comment: Accepted as Oral in 3DV 2020; supplementary material included; added
results on 3DPW dataset in revisio