We present a method for estimating articulated human pose from a single
static image based on a graphical model with novel pairwise relations that make
adaptive use of local image measurements. More precisely, we specify a
graphical model for human pose which exploits the fact the local image
measurements can be used both to detect parts (or joints) and also to predict
the spatial relationships between them (Image Dependent Pairwise Relations).
These spatial relationships are represented by a mixture model. We use Deep
Convolutional Neural Networks (DCNNs) to learn conditional probabilities for
the presence of parts and their spatial relationships within image patches.
Hence our model combines the representational flexibility of graphical models
with the efficiency and statistical power of DCNNs. Our method significantly
outperforms the state of the art methods on the LSP and FLIC datasets and also
performs very well on the Buffy dataset without any training.Comment: NIPS 2014 Camera Read