27 research outputs found
Multi-Person Pose Estimation with Local Joint-to-Person Associations
Despite of the recent success of neural networks for human pose estimation,
current approaches are limited to pose estimation of a single person and cannot
handle humans in groups or crowds. In this work, we propose a method that
estimates the poses of multiple persons in an image in which a person can be
occluded by another person or might be truncated. To this end, we consider
multi-person pose estimation as a joint-to-person association problem. We
construct a fully connected graph from a set of detected joint candidates in an
image and resolve the joint-to-person association and outlier detection using
integer linear programming. Since solving joint-to-person association jointly
for all persons in an image is an NP-hard problem and even approximations are
expensive, we solve the problem locally for each person. On the challenging
MPII Human Pose Dataset for multiple persons, our approach achieves the
accuracy of a state-of-the-art method, but it is 6,000 to 19,000 times faster.Comment: Accepted to European Conference on Computer Vision (ECCV) Workshops,
Crowd Understanding, 201
Self Adversarial Training for Human Pose Estimation
This paper presents a deep learning based approach to the problem of human
pose estimation. We employ generative adversarial networks as our learning
paradigm in which we set up two stacked hourglass networks with the same
architecture, one as the generator and the other as the discriminator. The
generator is used as a human pose estimator after the training is done. The
discriminator distinguishes ground-truth heatmaps from generated ones, and
back-propagates the adversarial loss to the generator. This process enables the
generator to learn plausible human body configurations and is shown to be
useful for improving the prediction accuracy.Comment: CVPR 2017 Workshop on Visual Understanding of Humans in Crowd Scene
and the 1st Look Into Person (LIP) Challeng
Holistic, Instance-Level Human Parsing
Object parsing -- the task of decomposing an object into its semantic parts
-- has traditionally been formulated as a category-level segmentation problem.
Consequently, when there are multiple objects in an image, current methods
cannot count the number of objects in the scene, nor can they determine which
part belongs to which object. We address this problem by segmenting the parts
of objects at an instance-level, such that each pixel in the image is assigned
a part label, as well as the identity of the object it belongs to. Moreover, we
show how this approach benefits us in obtaining segmentations at coarser
granularities as well. Our proposed network is trained end-to-end given
detections, and begins with a category-level segmentation module. Thereafter, a
differentiable Conditional Random Field, defined over a variable number of
instances for every input image, reasons about the identity of each part by
associating it with a human detection. In contrast to other approaches, our
method can handle the varying number of people in each image and our holistic
network produces state-of-the-art results in instance-level part and human
segmentation, together with competitive results in category-level part
segmentation, all achieved by a single forward-pass through our neural network.Comment: Poster at BMVC 201
DeeperCut: A Deeper, Stronger, and Faster Multi-Person Pose Estimation Model
The goal of this paper is to advance the state-of-the-art of articulated pose
estimation in scenes with multiple people. To that end we contribute on three
fronts. We propose (1) improved body part detectors that generate effective
bottom-up proposals for body parts; (2) novel image-conditioned pairwise terms
that allow to assemble the proposals into a variable number of consistent body
part configurations; and (3) an incremental optimization strategy that explores
the search space more efficiently thus leading both to better performance and
significant speed-up factors. Evaluation is done on two single-person and two
multi-person pose estimation benchmarks. The proposed approach significantly
outperforms best known multi-person pose estimation results while demonstrating
competitive performance on the task of single person pose estimation. Models
and code available at http://pose.mpi-inf.mpg.deComment: ECCV'16. High-res version at
https://www.d2.mpi-inf.mpg.de/sites/default/files/insafutdinov16arxiv.pd
MultiPoseNet: Fast Multi-Person Pose Estimation using Pose Residual Network
In this paper, we present MultiPoseNet, a novel bottom-up multi-person pose
estimation architecture that combines a multi-task model with a novel
assignment method. MultiPoseNet can jointly handle person detection, keypoint
detection, person segmentation and pose estimation problems. The novel
assignment method is implemented by the Pose Residual Network (PRN) which
receives keypoint and person detections, and produces accurate poses by
assigning keypoints to person instances. On the COCO keypoints dataset, our
pose estimation method outperforms all previous bottom-up methods both in
accuracy (+4-point mAP over previous best result) and speed; it also performs
on par with the best top-down methods while being at least 4x faster. Our
method is the fastest real time system with 23 frames/sec. Source code is
available at: https://github.com/mkocabas/pose-residual-networkComment: to appear in ECCV 201
JPPF: Multi-task Fusion for Consistent Panoptic-Part Segmentation
Part-aware panoptic segmentation is a problem of computer vision that aims to
provide a semantic understanding of the scene at multiple levels of
granularity. More precisely, semantic areas, object instances, and semantic
parts are predicted simultaneously. In this paper, we present our Joint
Panoptic Part Fusion (JPPF) that combines the three individual segmentations
effectively to obtain a panoptic-part segmentation. Two aspects are of utmost
importance for this: First, a unified model for the three problems is desired
that allows for mutually improved and consistent representation learning.
Second, balancing the combination so that it gives equal importance to all
individual results during fusion. Our proposed JPPF is parameter-free and
dynamically balances its input. The method is evaluated and compared on the
Cityscapes Panoptic Parts (CPP) and Pascal Panoptic Parts (PPP) datasets in
terms of PartPQ and Part-Whole Quality (PWQ). In extensive experiments, we
verify the importance of our fair fusion, highlight its most significant impact
for areas that can be further segmented into parts, and demonstrate the
generalization capabilities of our design without fine-tuning on 5 additional
datasets.Comment: Accepted for Springer Nature Computer Science. arXiv admin note:
substantial text overlap with arXiv:2212.0767