239,400 research outputs found
Recurrent Human Pose Estimation
We propose a novel ConvNet model for predicting 2D human body poses in an
image. The model regresses a heatmap representation for each body keypoint, and
is able to learn and represent both the part appearances and the context of the
part configuration. We make the following three contributions: (i) an
architecture combining a feed forward module with a recurrent module, where the
recurrent module can be run iteratively to improve the performance, (ii) the
model can be trained end-to-end and from scratch, with auxiliary losses
incorporated to improve performance, (iii) we investigate whether keypoint
visibility can also be predicted. The model is evaluated on two benchmark
datasets. The result is a simple architecture that achieves performance on par
with the state of the art, but without the complexity of a graphical model
stage (or layers).Comment: FG 2017, More Info and Demo:
http://www.robots.ox.ac.uk/~vgg/software/keypoint_detection
Learning to Refine Human Pose Estimation
Multi-person pose estimation in images and videos is an important yet
challenging task with many applications. Despite the large improvements in
human pose estimation enabled by the development of convolutional neural
networks, there still exist a lot of difficult cases where even the
state-of-the-art models fail to correctly localize all body joints. This
motivates the need for an additional refinement step that addresses these
challenging cases and can be easily applied on top of any existing method. In
this work, we introduce a pose refinement network (PoseRefiner) which takes as
input both the image and a given pose estimate and learns to directly predict a
refined pose by jointly reasoning about the input-output space. In order for
the network to learn to refine incorrect body joint predictions, we employ a
novel data augmentation scheme for training, where we model "hard" human pose
cases. We evaluate our approach on four popular large-scale pose estimation
benchmarks such as MPII Single- and Multi-Person Pose Estimation, PoseTrack
Pose Estimation, and PoseTrack Pose Tracking, and report systematic improvement
over the state of the art.Comment: To appear in CVPRW (2018). Workshop: Visual Understanding of Humans
in Crowd Scene and the 2nd Look Into Person Challenge (VUHCS-LIP
It's all Relative: Monocular 3D Human Pose Estimation from Weakly Supervised Data
We address the problem of 3D human pose estimation from 2D input images using
only weakly supervised training data. Despite showing considerable success for
2D pose estimation, the application of supervised machine learning to 3D pose
estimation in real world images is currently hampered by the lack of varied
training images with corresponding 3D poses. Most existing 3D pose estimation
algorithms train on data that has either been collected in carefully controlled
studio settings or has been generated synthetically. Instead, we take a
different approach, and propose a 3D human pose estimation algorithm that only
requires relative estimates of depth at training time. Such training signal,
although noisy, can be easily collected from crowd annotators, and is of
sufficient quality for enabling successful training and evaluation of 3D pose
algorithms. Our results are competitive with fully supervised regression based
approaches on the Human3.6M dataset, despite using significantly weaker
training data. Our proposed algorithm opens the door to using existing
widespread 2D datasets for 3D pose estimation by allowing fine-tuning with
noisy relative constraints, resulting in more accurate 3D poses.Comment: BMVC 2018. Project page available at
http://www.vision.caltech.edu/~mronchi/projects/RelativePos
Cascaded 3D Full-body Pose Regression from Single Depth Image at 100 FPS
There are increasing real-time live applications in virtual reality, where it
plays an important role in capturing and retargetting 3D human pose. But it is
still challenging to estimate accurate 3D pose from consumer imaging devices
such as depth camera. This paper presents a novel cascaded 3D full-body pose
regression method to estimate accurate pose from a single depth image at 100
fps. The key idea is to train cascaded regressors based on Gradient Boosting
algorithm from pre-recorded human motion capture database. By incorporating
hierarchical kinematics model of human pose into the learning procedure, we can
directly estimate accurate 3D joint angles instead of joint positions. The
biggest advantage of this model is that the bone length can be preserved during
the whole 3D pose estimation procedure, which leads to more effective features
and higher pose estimation accuracy. Our method can be used as an
initialization procedure when combining with tracking methods. We demonstrate
the power of our method on a wide range of synthesized human motion data from
CMU mocap database, Human3.6M dataset and real human movements data captured in
real time. In our comparison against previous 3D pose estimation methods and
commercial system such as Kinect 2017, we achieve the state-of-the-art
accuracy
Robust Estimation of 3D Human Poses from a Single Image
Human pose estimation is a key step to action recognition. We propose a
method of estimating 3D human poses from a single image, which works in
conjunction with an existing 2D pose/joint detector. 3D pose estimation is
challenging because multiple 3D poses may correspond to the same 2D pose after
projection due to the lack of depth information. Moreover, current 2D pose
estimators are usually inaccurate which may cause errors in the 3D estimation.
We address the challenges in three ways: (i) We represent a 3D pose as a linear
combination of a sparse set of bases learned from 3D human skeletons. (ii) We
enforce limb length constraints to eliminate anthropomorphically implausible
skeletons. (iii) We estimate a 3D pose by minimizing the -norm error
between the projection of the 3D pose and the corresponding 2D detection. The
-norm loss term is robust to inaccurate 2D joint estimations. We use the
alternating direction method (ADM) to solve the optimization problem
efficiently. Our approach outperforms the state-of-the-arts on three benchmark
datasets
- …