35,595 research outputs found
Multi-Person Pose Estimation with Local Joint-to-Person Associations
Despite of the recent success of neural networks for human pose estimation,
current approaches are limited to pose estimation of a single person and cannot
handle humans in groups or crowds. In this work, we propose a method that
estimates the poses of multiple persons in an image in which a person can be
occluded by another person or might be truncated. To this end, we consider
multi-person pose estimation as a joint-to-person association problem. We
construct a fully connected graph from a set of detected joint candidates in an
image and resolve the joint-to-person association and outlier detection using
integer linear programming. Since solving joint-to-person association jointly
for all persons in an image is an NP-hard problem and even approximations are
expensive, we solve the problem locally for each person. On the challenging
MPII Human Pose Dataset for multiple persons, our approach achieves the
accuracy of a state-of-the-art method, but it is 6,000 to 19,000 times faster.Comment: Accepted to European Conference on Computer Vision (ECCV) Workshops,
Crowd Understanding, 201
XNect: Real-time Multi-Person 3D Motion Capture with a Single RGB Camera
We present a real-time approach for multi-person 3D motion capture at over 30
fps using a single RGB camera. It operates successfully in generic scenes which
may contain occlusions by objects and by other people. Our method operates in
subsequent stages. The first stage is a convolutional neural network (CNN) that
estimates 2D and 3D pose features along with identity assignments for all
visible joints of all individuals.We contribute a new architecture for this
CNN, called SelecSLS Net, that uses novel selective long and short range skip
connections to improve the information flow allowing for a drastically faster
network without compromising accuracy. In the second stage, a fully connected
neural network turns the possibly partial (on account of occlusion) 2Dpose and
3Dpose features for each subject into a complete 3Dpose estimate per
individual. The third stage applies space-time skeletal model fitting to the
predicted 2D and 3D pose per subject to further reconcile the 2D and 3D pose,
and enforce temporal coherence. Our method returns the full skeletal pose in
joint angles for each subject. This is a further key distinction from previous
work that do not produce joint angle results of a coherent skeleton in real
time for multi-person scenes. The proposed system runs on consumer hardware at
a previously unseen speed of more than 30 fps given 512x320 images as input
while achieving state-of-the-art accuracy, which we will demonstrate on a range
of challenging real-world scenes.Comment: To appear in ACM Transactions on Graphics (SIGGRAPH) 202
XNect: Real-time Multi-person 3D Human Pose Estimation with a Single RGB Camera
We present a real-time approach for multi-person 3D motion capture at over 30 fps using a single RGB camera. It operates in generic scenes and is robust to difficult occlusions both by other people and objects. Our method operates in subsequent stages. The first stage is a convolutional neural network (CNN) that estimates 2D and 3D pose features along with identity assignments for all visible joints of all individuals. We contribute a new architecture for this CNN, called SelecSLS Net, that uses novel selective long and short range skip connections to improve the information flow allowing for a drastically faster network without compromising accuracy. In the second stage, a fully-connected neural network turns the possibly partial (on account of occlusion) 2D pose and 3D pose features for each subject into a complete 3D pose estimate per individual. The third stage applies space-time skeletal model fitting to the predicted 2D and 3D pose per subject to further reconcile the 2D and 3D pose, and enforce temporal coherence. Our method returns the full skeletal pose in joint angles for each subject. This is a further key distinction from previous work that neither extracted global body positions nor joint angle results of a coherent skeleton in real time for multi-person scenes. The proposed system runs on consumer hardware at a previously unseen speed of more than 30 fps given 512x320 images as input while achieving state-of-the-art accuracy, which we will demonstrate on a range of challenging real-world scenes
Towards Accurate Multi-person Pose Estimation in the Wild
We propose a method for multi-person detection and 2-D pose estimation that
achieves state-of-art results on the challenging COCO keypoints task. It is a
simple, yet powerful, top-down approach consisting of two stages.
In the first stage, we predict the location and scale of boxes which are
likely to contain people; for this we use the Faster RCNN detector. In the
second stage, we estimate the keypoints of the person potentially contained in
each proposed bounding box. For each keypoint type we predict dense heatmaps
and offsets using a fully convolutional ResNet. To combine these outputs we
introduce a novel aggregation procedure to obtain highly localized keypoint
predictions. We also use a novel form of keypoint-based Non-Maximum-Suppression
(NMS), instead of the cruder box-level NMS, and a novel form of keypoint-based
confidence score estimation, instead of box-level scoring.
Trained on COCO data alone, our final system achieves average precision of
0.649 on the COCO test-dev set and the 0.643 test-standard sets, outperforming
the winner of the 2016 COCO keypoints challenge and other recent state-of-art.
Further, by using additional in-house labeled data we obtain an even higher
average precision of 0.685 on the test-dev set and 0.673 on the test-standard
set, more than 5% absolute improvement compared to the previous best performing
method on the same dataset.Comment: Paper describing an improved version of the G-RMI entry to the 2016
COCO keypoints challenge (http://image-net.org/challenges/ilsvrc+coco2016).
Camera ready version to appear in the Proceedings of CVPR 201
Single-Shot Multi-Person 3D Pose Estimation From Monocular RGB
We propose a new single-shot method for multi-person 3D pose estimation in
general scenes from a monocular RGB camera. Our approach uses novel
occlusion-robust pose-maps (ORPM) which enable full body pose inference even
under strong partial occlusions by other people and objects in the scene. ORPM
outputs a fixed number of maps which encode the 3D joint locations of all
people in the scene. Body part associations allow us to infer 3D pose for an
arbitrary number of people without explicit bounding box prediction. To train
our approach we introduce MuCo-3DHP, the first large scale training data set
showing real images of sophisticated multi-person interactions and occlusions.
We synthesize a large corpus of multi-person images by compositing images of
individual people (with ground truth from mutli-view performance capture). We
evaluate our method on our new challenging 3D annotated multi-person test set
MuPoTs-3D where we achieve state-of-the-art performance. To further stimulate
research in multi-person 3D pose estimation, we will make our new datasets, and
associated code publicly available for research purposes.Comment: International Conference on 3D Vision (3DV), 201
PoseTrack: A Benchmark for Human Pose Estimation and Tracking
Human poses and motions are important cues for analysis of videos with people
and there is strong evidence that representations based on body pose are highly
effective for a variety of tasks such as activity recognition, content
retrieval and social signal processing. In this work, we aim to further advance
the state of the art by establishing "PoseTrack", a new large-scale benchmark
for video-based human pose estimation and articulated tracking, and bringing
together the community of researchers working on visual human analysis. The
benchmark encompasses three competition tracks focusing on i) single-frame
multi-person pose estimation, ii) multi-person pose estimation in videos, and
iii) multi-person articulated tracking. To facilitate the benchmark and
challenge we collect, annotate and release a new %large-scale benchmark dataset
that features videos with multiple people labeled with person tracks and
articulated pose. A centralized evaluation server is provided to allow
participants to evaluate on a held-out test set. We envision that the proposed
benchmark will stimulate productive research both by providing a large and
representative training dataset as well as providing a platform to objectively
evaluate and compare the proposed methods. The benchmark is freely accessible
at https://posetrack.net.Comment: www.posetrack.ne
- …