5,301 research outputs found
A Novel Efficient Algorithm for Locating and Tracking Object Parts in Low Resolution Videos
This is the published version. Copyright De GruyterIn this paper, a novel efficient algorithm is presented for locating and tracking object parts in low resolution videos using Lowe's SIFT keypoints with a nearest neighbor object detection approach. Our interest lies in using this information as one step in the process of automatically programming service, household, or personal robots to perform the skills that are being taught in easily obtainable instructional videos. In the reported experiments, the system looked for 14 parts of inanimate and animate objects in 40 natural outdoor scenes. The scenes were frames from a low-resolution instructional video on cleaning golf clubs containing 2,405 frames of 180 by 240 pixels. The system was trained using 39 frames that were half-way between the test frames. Despite the low resolution quality of the instructional video and occluded training samples, the system achieved a recall of 49 % with a precision of 71 % and an Fl of 0.58, which is better than that achieved by less demanding applications. In order to verify that the reported results were not dependent on the specific video, the proposed technique was applied to another video and the results are reported
Detect-and-Track: Efficient Pose Estimation in Videos
This paper addresses the problem of estimating and tracking human body
keypoints in complex, multi-person video. We propose an extremely lightweight
yet highly effective approach that builds upon the latest advancements in human
detection and video understanding. Our method operates in two-stages: keypoint
estimation in frames or short clips, followed by lightweight tracking to
generate keypoint predictions linked over the entire video. For frame-level
pose estimation we experiment with Mask R-CNN, as well as our own proposed 3D
extension of this model, which leverages temporal information over small clips
to generate more robust frame predictions. We conduct extensive ablative
experiments on the newly released multi-person video pose estimation benchmark,
PoseTrack, to validate various design choices of our model. Our approach
achieves an accuracy of 55.2% on the validation and 51.8% on the test set using
the Multi-Object Tracking Accuracy (MOTA) metric, and achieves state of the art
performance on the ICCV 2017 PoseTrack keypoint tracking challenge.Comment: In CVPR 2018. Ranked first in ICCV 2017 PoseTrack challenge (keypoint
tracking in videos). Code: https://github.com/facebookresearch/DetectAndTrack
and webpage: https://rohitgirdhar.github.io/DetectAndTrack
Extraction and Classification of Diving Clips from Continuous Video Footage
Due to recent advances in technology, the recording and analysis of video
data has become an increasingly common component of athlete training
programmes. Today it is incredibly easy and affordable to set up a fixed camera
and record athletes in a wide range of sports, such as diving, gymnastics,
golf, tennis, etc. However, the manual analysis of the obtained footage is a
time-consuming task which involves isolating actions of interest and
categorizing them using domain-specific knowledge. In order to automate this
kind of task, three challenging sub-problems are often encountered: 1)
temporally cropping events/actions of interest from continuous video; 2)
tracking the object of interest; and 3) classifying the events/actions of
interest.
Most previous work has focused on solving just one of the above sub-problems
in isolation. In contrast, this paper provides a complete solution to the
overall action monitoring task in the context of a challenging real-world
exemplar. Specifically, we address the problem of diving classification. This
is a challenging problem since the person (diver) of interest typically
occupies fewer than 1% of the pixels in each frame. The model is required to
learn the temporal boundaries of a dive, even though other divers and
bystanders may be in view. Finally, the model must be sensitive to subtle
changes in body pose over a large number of frames to determine the
classification code. We provide effective solutions to each of the sub-problems
which combine to provide a highly functional solution to the task as a whole.
The techniques proposed can be easily generalized to video footage recorded
from other sports.Comment: To appear at CVsports 201
- …