In this paper we consider the problem of human pose estimation from a single
still image. We propose a novel approach where each location in the image votes
for the position of each keypoint using a convolutional neural net. The voting
scheme allows us to utilize information from the whole image, rather than rely
on a sparse set of keypoint locations. Using dense, multi-target votes, not
only produces good keypoint predictions, but also enables us to compute
image-dependent joint keypoint probabilities by looking at consensus voting.
This differs from most previous methods where joint probabilities are learned
from relative keypoint locations and are independent of the image. We finally
combine the keypoints votes and joint probabilities in order to identify the
optimal pose configuration. We show our competitive performance on the MPII
Human Pose and Leeds Sports Pose datasets