32,419 research outputs found
Detect-and-Track: Efficient Pose Estimation in Videos
This paper addresses the problem of estimating and tracking human body
keypoints in complex, multi-person video. We propose an extremely lightweight
yet highly effective approach that builds upon the latest advancements in human
detection and video understanding. Our method operates in two-stages: keypoint
estimation in frames or short clips, followed by lightweight tracking to
generate keypoint predictions linked over the entire video. For frame-level
pose estimation we experiment with Mask R-CNN, as well as our own proposed 3D
extension of this model, which leverages temporal information over small clips
to generate more robust frame predictions. We conduct extensive ablative
experiments on the newly released multi-person video pose estimation benchmark,
PoseTrack, to validate various design choices of our model. Our approach
achieves an accuracy of 55.2% on the validation and 51.8% on the test set using
the Multi-Object Tracking Accuracy (MOTA) metric, and achieves state of the art
performance on the ICCV 2017 PoseTrack keypoint tracking challenge.Comment: In CVPR 2018. Ranked first in ICCV 2017 PoseTrack challenge (keypoint
tracking in videos). Code: https://github.com/facebookresearch/DetectAndTrack
and webpage: https://rohitgirdhar.github.io/DetectAndTrack
Interspecies Knowledge Transfer for Facial Keypoint Detection
We present a method for localizing facial keypoints on animals by
transferring knowledge gained from human faces. Instead of directly finetuning
a network trained to detect keypoints on human faces to animal faces (which is
sub-optimal since human and animal faces can look quite different), we propose
to first adapt the animal images to the pre-trained human detection network by
correcting for the differences in animal and human face shape. We first find
the nearest human neighbors for each animal image using an unsupervised shape
matching method. We use these matches to train a thin plate spline warping
network to warp each animal face to look more human-like. The warping network
is then jointly finetuned with a pre-trained human facial keypoint detection
network using an animal dataset. We demonstrate state-of-the-art results on
both horse and sheep facial keypoint detection, and significant improvement
over simple finetuning, especially when training data is scarce. Additionally,
we present a new dataset with 3717 images with horse face and facial keypoint
annotations.Comment: CVPR 2017 Camera Read
- …