2,899 research outputs found
Multimodal framework based on audio‐visual features for summarisation of cricket videos
Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/166171/1/ipr2bf02094.pd
Real-time event classification in field sport videos
The paper presents a novel approach to real-time event detection in sports broadcasts. We present how the same underlying audio-visual feature extraction algorithm based on new global image descriptors is robust across a range of different sports alleviating the need to tailor it to a particular sport. In addition, we propose and evaluate three different classifiers in order to detect events using these features: a feed-forward neural network, an Elman neural network and a decision tree. Each are investigated and evaluated in terms of their usefulness for real-time event classification. We also propose a ground truth dataset together with an annotation technique for performance evaluation of each classifier useful to others interested in this problem
Deep-Learning-Based Computer Vision Approach For The Segmentation Of Ball Deliveries And Tracking In Cricket
There has been a significant increase in the adoption of technology in
cricket recently. This trend has created the problem of duplicate work being
done in similar computer vision-based research works. Our research tries to
solve one of these problems by segmenting ball deliveries in a cricket
broadcast using deep learning models, MobileNet and YOLO, thus enabling
researchers to use our work as a dataset for their research. The output from
our research can be used by cricket coaches and players to analyze ball
deliveries which are played during the match. This paper presents an approach
to segment and extract video shots in which only the ball is being delivered.
The video shots are a series of continuous frames that make up the whole scene
of the video. Object detection models are applied to reach a high level of
accuracy in terms of correctly extracting video shots. The proof of concept for
building large datasets of video shots for ball deliveries is proposed which
paves the way for further processing on those shots for the extraction of
semantics. Ball tracking in these video shots is also done using a separate
RetinaNet model as a sample of the usefulness of the proposed dataset. The
position on the cricket pitch where the ball lands is also extracted by
tracking the ball along the y-axis. The video shot is then classified as a
full-pitched, good-length or short-pitched delivery
Activity Recognition for Quality Assessment of Batting Shots in Cricket using a Hierarchical Representation
Quality assessment in cricket is a complex task that is performed by understanding the combination of individual activities a player is able to perform and by assessing how well these activities are performed. We present a framework for inexpensive and accessible, automated recognition of cricketing shots. By means of body-worn inertial measurement units, movements of batsmen are recorded, which are then analysed using a parallelised, hierarchical recognition system that automatically classifies relevant categories of shots as required for assessing batting quality. Our system then generates meaningful visualisations of key performance parameters, including feet positions, attack/defence, and distribution of shots around the ground. These visualisations are the basis for objective skill assessment thereby focusing on specific personal improvement points as identified through our system. We evaluated our framework through a deployment study where 6 players engaged in batting exercises. Based on the recorded movement data we could automatically identify 20 classes of unique batting shot components with an average F1-score greater than 88%. This analysis is the basis for our detailed analysis of our study participants’ skills. Our system has the potential to rival expensive vision-based systems but at a fraction of the cost
The THUMOS Challenge on Action Recognition for Videos "in the Wild"
Automatically recognizing and localizing wide ranges of human actions has
crucial importance for video understanding. Towards this goal, the THUMOS
challenge was introduced in 2013 to serve as a benchmark for action
recognition. Until then, video action recognition, including THUMOS challenge,
had focused primarily on the classification of pre-segmented (i.e., trimmed)
videos, which is an artificial task. In THUMOS 2014, we elevated action
recognition to a more practical level by introducing temporally untrimmed
videos. These also include `background videos' which share similar scenes and
backgrounds as action videos, but are devoid of the specific actions. The three
editions of the challenge organized in 2013--2015 have made THUMOS a common
benchmark for action classification and detection and the annual challenge is
widely attended by teams from around the world.
In this paper we describe the THUMOS benchmark in detail and give an overview
of data collection and annotation procedures. We present the evaluation
protocols used to quantify results in the two THUMOS tasks of action
classification and temporal detection. We also present results of submissions
to the THUMOS 2015 challenge and review the participating approaches.
Additionally, we include a comprehensive empirical study evaluating the
differences in action recognition between trimmed and untrimmed videos, and how
well methods trained on trimmed videos generalize to untrimmed videos. We
conclude by proposing several directions and improvements for future THUMOS
challenges.Comment: Preprint submitted to Computer Vision and Image Understandin
Digging Deeper into Egocentric Gaze Prediction
This paper digs deeper into factors that influence egocentric gaze. Instead
of training deep models for this purpose in a blind manner, we propose to
inspect factors that contribute to gaze guidance during daily tasks. Bottom-up
saliency and optical flow are assessed versus strong spatial prior baselines.
Task-specific cues such as vanishing point, manipulation point, and hand
regions are analyzed as representatives of top-down information. We also look
into the contribution of these factors by investigating a simple recurrent
neural model for ego-centric gaze prediction. First, deep features are
extracted for all input video frames. Then, a gated recurrent unit is employed
to integrate information over time and to predict the next fixation. We also
propose an integrated model that combines the recurrent model with several
top-down and bottom-up cues. Extensive experiments over multiple datasets
reveal that (1) spatial biases are strong in egocentric videos, (2) bottom-up
saliency models perform poorly in predicting gaze and underperform spatial
biases, (3) deep features perform better compared to traditional features, (4)
as opposed to hand regions, the manipulation point is a strong influential cue
for gaze prediction, (5) combining the proposed recurrent model with bottom-up
cues, vanishing points and, in particular, manipulation point results in the
best gaze prediction accuracy over egocentric videos, (6) the knowledge
transfer works best for cases where the tasks or sequences are similar, and (7)
task and activity recognition can benefit from gaze prediction. Our findings
suggest that (1) there should be more emphasis on hand-object interaction and
(2) the egocentric vision community should consider larger datasets including
diverse stimuli and more subjects.Comment: presented at WACV 201
- …