294 research outputs found
Parsing Occluded People by Flexible Compositions
This paper presents an approach to parsing humans when there is significant
occlusion. We model humans using a graphical model which has a tree structure
building on recent work [32, 6] and exploit the connectivity prior that, even
in presence of occlusion, the visible nodes form a connected subtree of the
graphical model. We call each connected subtree a flexible composition of
object parts. This involves a novel method for learning occlusion cues. During
inference we need to search over a mixture of different flexible models. By
exploiting part sharing, we show that this inference can be done extremely
efficiently requiring only twice as many computations as searching for the
entire object (i.e., not modeling occlusion). We evaluate our model on the
standard benchmarked "We Are Family" Stickmen dataset and obtain significant
performance improvements over the best alternative algorithms.Comment: CVPR 15 Camera Read
Occlusion Coherence: Detecting and Localizing Occluded Faces
The presence of occluders significantly impacts object recognition accuracy.
However, occlusion is typically treated as an unstructured source of noise and
explicit models for occluders have lagged behind those for object appearance
and shape. In this paper we describe a hierarchical deformable part model for
face detection and landmark localization that explicitly models part occlusion.
The proposed model structure makes it possible to augment positive training
data with large numbers of synthetically occluded instances. This allows us to
easily incorporate the statistics of occlusion patterns in a discriminatively
trained model. We test the model on several benchmarks for landmark
localization and detection including challenging new data sets featuring
significant occlusion. We find that the addition of an explicit occlusion model
yields a detection system that outperforms existing approaches for occluded
instances while maintaining competitive accuracy in detection and landmark
localization for unoccluded instances
Multi-Person Pose Estimation with Local Joint-to-Person Associations
Despite of the recent success of neural networks for human pose estimation,
current approaches are limited to pose estimation of a single person and cannot
handle humans in groups or crowds. In this work, we propose a method that
estimates the poses of multiple persons in an image in which a person can be
occluded by another person or might be truncated. To this end, we consider
multi-person pose estimation as a joint-to-person association problem. We
construct a fully connected graph from a set of detected joint candidates in an
image and resolve the joint-to-person association and outlier detection using
integer linear programming. Since solving joint-to-person association jointly
for all persons in an image is an NP-hard problem and even approximations are
expensive, we solve the problem locally for each person. On the challenging
MPII Human Pose Dataset for multiple persons, our approach achieves the
accuracy of a state-of-the-art method, but it is 6,000 to 19,000 times faster.Comment: Accepted to European Conference on Computer Vision (ECCV) Workshops,
Crowd Understanding, 201
Joint Multi-Person Pose Estimation and Semantic Part Segmentation
Human pose estimation and semantic part segmentation are two complementary
tasks in computer vision. In this paper, we propose to solve the two tasks
jointly for natural multi-person images, in which the estimated pose provides
object-level shape prior to regularize part segments while the part-level
segments constrain the variation of pose locations. Specifically, we first
train two fully convolutional neural networks (FCNs), namely Pose FCN and Part
FCN, to provide initial estimation of pose joint potential and semantic part
potential. Then, to refine pose joint location, the two types of potentials are
fused with a fully-connected conditional random field (FCRF), where a novel
segment-joint smoothness term is used to encourage semantic and spatial
consistency between parts and joints. To refine part segments, the refined pose
and the original part potential are integrated through a Part FCN, where the
skeleton feature from pose serves as additional regularization cues for part
segments. Finally, to reduce the complexity of the FCRF, we induce human
detection boxes and infer the graph inside each box, making the inference forty
times faster.
Since there's no dataset that contains both part segments and pose labels, we
extend the PASCAL VOC part dataset with human pose joints and perform extensive
experiments to compare our method against several most recent strategies. We
show that on this dataset our algorithm surpasses competing methods by a large
margin in both tasks.Comment: This paper has been accepted by CVPR 201
DeeperCut: A Deeper, Stronger, and Faster Multi-Person Pose Estimation Model
The goal of this paper is to advance the state-of-the-art of articulated pose
estimation in scenes with multiple people. To that end we contribute on three
fronts. We propose (1) improved body part detectors that generate effective
bottom-up proposals for body parts; (2) novel image-conditioned pairwise terms
that allow to assemble the proposals into a variable number of consistent body
part configurations; and (3) an incremental optimization strategy that explores
the search space more efficiently thus leading both to better performance and
significant speed-up factors. Evaluation is done on two single-person and two
multi-person pose estimation benchmarks. The proposed approach significantly
outperforms best known multi-person pose estimation results while demonstrating
competitive performance on the task of single person pose estimation. Models
and code available at http://pose.mpi-inf.mpg.deComment: ECCV'16. High-res version at
https://www.d2.mpi-inf.mpg.de/sites/default/files/insafutdinov16arxiv.pd
DeepCut: Joint Subset Partition and Labeling for Multi Person Pose Estimation
This paper considers the task of articulated human pose estimation of
multiple people in real world images. We propose an approach that jointly
solves the tasks of detection and pose estimation: it infers the number of
persons in a scene, identifies occluded body parts, and disambiguates body
parts between people in close proximity of each other. This joint formulation
is in contrast to previous strategies, that address the problem by first
detecting people and subsequently estimating their body pose. We propose a
partitioning and labeling formulation of a set of body-part hypotheses
generated with CNN-based part detectors. Our formulation, an instance of an
integer linear program, implicitly performs non-maximum suppression on the set
of part candidates and groups them to form configurations of body parts
respecting geometric and appearance constraints. Experiments on four different
datasets demonstrate state-of-the-art results for both single person and multi
person pose estimation. Models and code available at
http://pose.mpi-inf.mpg.de.Comment: Accepted at IEEE Conference on Computer Vision and Pattern
Recognition (CVPR 2016
Detecting Semantic Parts on Partially Occluded Objects
In this paper, we address the task of detecting semantic parts on partially
occluded objects. We consider a scenario where the model is trained using
non-occluded images but tested on occluded images. The motivation is that there
are infinite number of occlusion patterns in real world, which cannot be fully
covered in the training data. So the models should be inherently robust and
adaptive to occlusions instead of fitting / learning the occlusion patterns in
the training data. Our approach detects semantic parts by accumulating the
confidence of local visual cues. Specifically, the method uses a simple voting
method, based on log-likelihood ratio tests and spatial constraints, to combine
the evidence of local cues. These cues are called visual concepts, which are
derived by clustering the internal states of deep networks. We evaluate our
voting scheme on the VehicleSemanticPart dataset with dense part annotations.
We randomly place two, three or four irrelevant objects onto the target object
to generate testing images with various occlusions. Experiments show that our
algorithm outperforms several competitors in semantic part detection when
occlusions are present.Comment: Accepted to BMVC 2017 (13 pages, 3 figures
- …