62,080 research outputs found
Detect-and-Track: Efficient Pose Estimation in Videos
This paper addresses the problem of estimating and tracking human body
keypoints in complex, multi-person video. We propose an extremely lightweight
yet highly effective approach that builds upon the latest advancements in human
detection and video understanding. Our method operates in two-stages: keypoint
estimation in frames or short clips, followed by lightweight tracking to
generate keypoint predictions linked over the entire video. For frame-level
pose estimation we experiment with Mask R-CNN, as well as our own proposed 3D
extension of this model, which leverages temporal information over small clips
to generate more robust frame predictions. We conduct extensive ablative
experiments on the newly released multi-person video pose estimation benchmark,
PoseTrack, to validate various design choices of our model. Our approach
achieves an accuracy of 55.2% on the validation and 51.8% on the test set using
the Multi-Object Tracking Accuracy (MOTA) metric, and achieves state of the art
performance on the ICCV 2017 PoseTrack keypoint tracking challenge.Comment: In CVPR 2018. Ranked first in ICCV 2017 PoseTrack challenge (keypoint
tracking in videos). Code: https://github.com/facebookresearch/DetectAndTrack
and webpage: https://rohitgirdhar.github.io/DetectAndTrack
Learning to track for spatio-temporal action localization
We propose an effective approach for spatio-temporal action localization in
realistic videos. The approach first detects proposals at the frame-level and
scores them with a combination of static and motion CNN features. It then
tracks high-scoring proposals throughout the video using a
tracking-by-detection approach. Our tracker relies simultaneously on
instance-level and class-level detectors. The tracks are scored using a
spatio-temporal motion histogram, a descriptor at the track level, in
combination with the CNN features. Finally, we perform temporal localization of
the action using a sliding-window approach at the track level. We present
experimental results for spatio-temporal localization on the UCF-Sports, J-HMDB
and UCF-101 action localization datasets, where our approach outperforms the
state of the art with a margin of 15%, 7% and 12% respectively in mAP
Vehicle Type Detection by Convolutional Neural Networks
In this work a new vehicle type detection procedure for traffic surveillance videos is proposed. A Convolutional Neural Network is
integrated into a vehicle tracking system in order to accomplish this task.
Solutions for vehicle overlapping, differing vehicle sizes and poor spatial resolution are presented. The system is tested on well known benchmarks, and multiclass recognition performance results are reported. Our proposal is shown to attain good results over a wide range of difficult
situations.Universidad de Málaga. Campus de Excelencia Internacional AndalucĂa Tech
- …