104,338 research outputs found
Multi-Context Attention for Human Pose Estimation
In this paper, we propose to incorporate convolutional neural networks with a
multi-context attention mechanism into an end-to-end framework for human pose
estimation. We adopt stacked hourglass networks to generate attention maps from
features at multiple resolutions with various semantics. The Conditional Random
Field (CRF) is utilized to model the correlations among neighboring regions in
the attention map. We further combine the holistic attention model, which
focuses on the global consistency of the full human body, and the body part
attention model, which focuses on the detailed description for different body
parts. Hence our model has the ability to focus on different granularity from
local salient regions to global semantic-consistent spaces. Additionally, we
design novel Hourglass Residual Units (HRUs) to increase the receptive field of
the network. These units are extensions of residual units with a side branch
incorporating filters with larger receptive fields, hence features with various
scales are learned and combined within the HRUs. The effectiveness of the
proposed multi-context attention mechanism and the hourglass residual units is
evaluated on two widely used human pose estimation benchmarks. Our approach
outperforms all existing methods on both benchmarks over all the body parts.Comment: The first two authors contribute equally to this wor
PifPaf: Composite Fields for Human Pose Estimation
We propose a new bottom-up method for multi-person 2D human pose estimation
that is particularly well suited for urban mobility such as self-driving cars
and delivery robots. The new method, PifPaf, uses a Part Intensity Field (PIF)
to localize body parts and a Part Association Field (PAF) to associate body
parts with each other to form full human poses. Our method outperforms previous
methods at low resolution and in crowded, cluttered and occluded scenes thanks
to (i) our new composite field PAF encoding fine-grained information and (ii)
the choice of Laplace loss for regressions which incorporates a notion of
uncertainty. Our architecture is based on a fully convolutional, single-shot,
box-free design. We perform on par with the existing state-of-the-art bottom-up
method on the standard COCO keypoint task and produce state-of-the-art results
on a modified COCO keypoint task for the transportation domain.Comment: CVPR 201
Recurrent Human Pose Estimation
We propose a novel ConvNet model for predicting 2D human body poses in an
image. The model regresses a heatmap representation for each body keypoint, and
is able to learn and represent both the part appearances and the context of the
part configuration. We make the following three contributions: (i) an
architecture combining a feed forward module with a recurrent module, where the
recurrent module can be run iteratively to improve the performance, (ii) the
model can be trained end-to-end and from scratch, with auxiliary losses
incorporated to improve performance, (iii) we investigate whether keypoint
visibility can also be predicted. The model is evaluated on two benchmark
datasets. The result is a simple architecture that achieves performance on par
with the state of the art, but without the complexity of a graphical model
stage (or layers).Comment: FG 2017, More Info and Demo:
http://www.robots.ox.ac.uk/~vgg/software/keypoint_detection
Joint Training of a Convolutional Network and a Graphical Model for Human Pose Estimation
This paper proposes a new hybrid architecture that consists of a deep
Convolutional Network and a Markov Random Field. We show how this architecture
is successfully applied to the challenging problem of articulated human pose
estimation in monocular images. The architecture can exploit structural domain
constraints such as geometric relationships between body joint locations. We
show that joint training of these two model paradigms improves performance and
allows us to significantly outperform existing state-of-the-art techniques
- …