24,039 research outputs found
Multi-label Class-imbalanced Action Recognition in Hockey Videos via 3D Convolutional Neural Networks
Automatic analysis of the video is one of most complex problems in the fields
of computer vision and machine learning. A significant part of this research
deals with (human) activity recognition (HAR) since humans, and the activities
that they perform, generate most of the video semantics. Video-based HAR has
applications in various domains, but one of the most important and challenging
is HAR in sports videos. Some of the major issues include high inter- and
intra-class variations, large class imbalance, the presence of both group
actions and single player actions, and recognizing simultaneous actions, i.e.,
the multi-label learning problem. Keeping in mind these challenges and the
recent success of CNNs in solving various computer vision problems, in this
work, we implement a 3D CNN based multi-label deep HAR system for multi-label
class-imbalanced action recognition in hockey videos. We test our system for
two different scenarios: an ensemble of binary networks vs. a single
-output network, on a publicly available dataset. We also compare our
results with the system that was originally designed for the chosen dataset.
Experimental results show that the proposed approach performs better than the
existing solution.Comment: Accepted to IEEE/ACIS SNPD 2018, 6 pages, 3 figure
Towards Structured Analysis of Broadcast Badminton Videos
Sports video data is recorded for nearly every major tournament but remains
archived and inaccessible to large scale data mining and analytics. It can only
be viewed sequentially or manually tagged with higher-level labels which is
time consuming and prone to errors. In this work, we propose an end-to-end
framework for automatic attributes tagging and analysis of sport videos. We use
commonly available broadcast videos of matches and, unlike previous approaches,
does not rely on special camera setups or additional sensors.
Our focus is on Badminton as the sport of interest. We propose a method to
analyze a large corpus of badminton broadcast videos by segmenting the points
played, tracking and recognizing the players in each point and annotating their
respective badminton strokes. We evaluate the performance on 10 Olympic matches
with 20 players and achieved 95.44% point segmentation accuracy, 97.38% player
detection score ([email protected]), 97.98% player identification accuracy, and stroke
segmentation edit scores of 80.48%. We further show that the automatically
annotated videos alone could enable the gameplay analysis and inference by
computing understandable metrics such as player's reaction time, speed, and
footwork around the court, etc.Comment: 9 page
Extraction and Classification of Diving Clips from Continuous Video Footage
Due to recent advances in technology, the recording and analysis of video
data has become an increasingly common component of athlete training
programmes. Today it is incredibly easy and affordable to set up a fixed camera
and record athletes in a wide range of sports, such as diving, gymnastics,
golf, tennis, etc. However, the manual analysis of the obtained footage is a
time-consuming task which involves isolating actions of interest and
categorizing them using domain-specific knowledge. In order to automate this
kind of task, three challenging sub-problems are often encountered: 1)
temporally cropping events/actions of interest from continuous video; 2)
tracking the object of interest; and 3) classifying the events/actions of
interest.
Most previous work has focused on solving just one of the above sub-problems
in isolation. In contrast, this paper provides a complete solution to the
overall action monitoring task in the context of a challenging real-world
exemplar. Specifically, we address the problem of diving classification. This
is a challenging problem since the person (diver) of interest typically
occupies fewer than 1% of the pixels in each frame. The model is required to
learn the temporal boundaries of a dive, even though other divers and
bystanders may be in view. Finally, the model must be sensitive to subtle
changes in body pose over a large number of frames to determine the
classification code. We provide effective solutions to each of the sub-problems
which combine to provide a highly functional solution to the task as a whole.
The techniques proposed can be easily generalized to video footage recorded
from other sports.Comment: To appear at CVsports 201
- …