Search CORE

5,301 research outputs found

A Novel Efficient Algorithm for Locating and Tracking Object Parts in Low Resolution Videos

Author: Agah Arvin
Johnson David O.
Publication venue: 'Walter de Gruyter GmbH'
Publication date: 01/04/2011
Field of study

This is the published version. Copyright De GruyterIn this paper, a novel efficient algorithm is presented for locating and tracking object parts in low resolution videos using Lowe's SIFT keypoints with a nearest neighbor object detection approach. Our interest lies in using this information as one step in the process of automatically programming service, household, or personal robots to perform the skills that are being taught in easily obtainable instructional videos. In the reported experiments, the system looked for 14 parts of inanimate and animate objects in 40 natural outdoor scenes. The scenes were frames from a low-resolution instructional video on cleaning golf clubs containing 2,405 frames of 180 by 240 pixels. The system was trained using 39 frames that were half-way between the test frames. Despite the low resolution quality of the instructional video and occluded training samples, the system achieved a recall of 49 % with a precision of 71 % and an Fl of 0.58, which is better than that achieved by less demanding applications. In order to verify that the reported results were not dependent on the specific video, the proposed technique was applied to another video and the results are reported

KU ScholarWorks

Directory of Open Access Journals

Detect-and-Track: Efficient Pose Estimation in Videos

Author: Girdhar Rohit
Gkioxari Georgia
Paluri Manohar
Torresani Lorenzo
Tran Du
Publication venue
Publication date: 02/05/2018
Field of study

This paper addresses the problem of estimating and tracking human body keypoints in complex, multi-person video. We propose an extremely lightweight yet highly effective approach that builds upon the latest advancements in human detection and video understanding. Our method operates in two-stages: keypoint estimation in frames or short clips, followed by lightweight tracking to generate keypoint predictions linked over the entire video. For frame-level pose estimation we experiment with Mask R-CNN, as well as our own proposed 3D extension of this model, which leverages temporal information over small clips to generate more robust frame predictions. We conduct extensive ablative experiments on the newly released multi-person video pose estimation benchmark, PoseTrack, to validate various design choices of our model. Our approach achieves an accuracy of 55.2% on the validation and 51.8% on the test set using the Multi-Object Tracking Accuracy (MOTA) metric, and achieves state of the art performance on the ICCV 2017 PoseTrack keypoint tracking challenge.Comment: In CVPR 2018. Ranked first in ICCV 2017 PoseTrack challenge (keypoint tracking in videos). Code: https://github.com/facebookresearch/DetectAndTrack and webpage: https://rohitgirdhar.github.io/DetectAndTrack

arXiv.org e-Print Archive

Crossref

Extraction and Classification of Diving Clips from Continuous Video Footage

Author: Greenwood Daniel
He Zhen
Morgan Stuart
Nibali Aiden
Publication venue
Publication date: 24/05/2017
Field of study

Due to recent advances in technology, the recording and analysis of video data has become an increasingly common component of athlete training programmes. Today it is incredibly easy and affordable to set up a fixed camera and record athletes in a wide range of sports, such as diving, gymnastics, golf, tennis, etc. However, the manual analysis of the obtained footage is a time-consuming task which involves isolating actions of interest and categorizing them using domain-specific knowledge. In order to automate this kind of task, three challenging sub-problems are often encountered: 1) temporally cropping events/actions of interest from continuous video; 2) tracking the object of interest; and 3) classifying the events/actions of interest. Most previous work has focused on solving just one of the above sub-problems in isolation. In contrast, this paper provides a complete solution to the overall action monitoring task in the context of a challenging real-world exemplar. Specifically, we address the problem of diving classification. This is a challenging problem since the person (diver) of interest typically occupies fewer than 1% of the pixels in each frame. The model is required to learn the temporal boundaries of a dive, even though other divers and bystanders may be in view. Finally, the model must be sensitive to subtle changes in body pose over a large number of frames to determine the classification code. We provide effective solutions to each of the sub-problems which combine to provide a highly functional solution to the task as a whole. The techniques proposed can be easily generalized to video footage recorded from other sports.Comment: To appear at CVsports 201

arXiv.org e-Print Archive

Crossref