338 research outputs found
SoccerNet: A Scalable Dataset for Action Spotting in Soccer Videos
In this paper, we introduce SoccerNet, a benchmark for action spotting in
soccer videos. The dataset is composed of 500 complete soccer games from six
main European leagues, covering three seasons from 2014 to 2017 and a total
duration of 764 hours. A total of 6,637 temporal annotations are automatically
parsed from online match reports at a one minute resolution for three main
classes of events (Goal, Yellow/Red Card, and Substitution). As such, the
dataset is easily scalable. These annotations are manually refined to a one
second resolution by anchoring them at a single timestamp following
well-defined soccer rules. With an average of one event every 6.9 minutes, this
dataset focuses on the problem of localizing very sparse events within long
videos. We define the task of spotting as finding the anchors of soccer events
in a video. Making use of recent developments in the realm of generic action
recognition and detection in video, we provide strong baselines for detecting
soccer events. We show that our best model for classifying temporal segments of
length one minute reaches a mean Average Precision (mAP) of 67.8%. For the
spotting task, our baseline reaches an Average-mAP of 49.7% for tolerances
ranging from 5 to 60 seconds. Our dataset and models are available at
https://silviogiancola.github.io/SoccerNet.Comment: CVPR Workshop on Computer Vision in Sports 201
Audio-visual football video analysis, from structure detection to attention analysis
Sport video is an important video genre. Content-based sports video analysis attracts great interest from both industry and academic ļ¬elds. A sports video is characterised by repetitive temporal structures, relatively plain contents, and strong spatio-temporal variations, such as quick camera switches and swift local motions. It is necessary to develop speciļ¬c techniques for content-based sports video analysis to utilise these characteristics.
For an efļ¬cient and effective sports video analysis system, there are three fundamental questions: (1) what are key stories for sports videos; (2) what incurs viewerās interest; and (3) how to identify game highlights. This thesis is developed around these questions. We approached these questions from two different perspectives and in turn three research contributions are presented, namely, replay detection, attack temporal structure decomposition, and attention-based highlight identiļ¬cation.
Replay segments convey the most important contents in sports videos. It is an efļ¬cient approach to collect game highlights by detecting replay segments. However, replay is an artefact of editing, which improves with advances in video editing tools. The composition of replay is complex, which includes logo transitions, slow motions, viewpoint switches and normal speed video clips. Since logo transition clips are pervasive in game collections of FIFA World Cup 2002, FIFA World Cup 2006 and UEFA Championship 2006, we take logo transition detection as an effective replacement of replay detection. A two-pass system was developed, including a ļ¬ve-layer adaboost classiļ¬er and a logo template matching throughout an entire video. The ļ¬ve-layer adaboost utilises shot duration, average game pitch ratio, average motion, sequential colour histogram and shot frequency between two neighbouring logo transitions, to ļ¬lter out logo transition candidates. Subsequently, a logo template is constructed and employed to ļ¬nd all transition logo sequences. The precision and recall of this system in replay detection is 100% in a ļ¬ve-game evaluation collection.
An attack structure is a team competition for a score. Hence, this structure is a conceptually fundamental unit of a football video as well as other sports videos. We review the literature of content-based temporal structures, such as play-break structure, and develop a three-step system for automatic attack structure decomposition. Four content-based shot classes, namely, play, focus, replay and break were identiļ¬ed by low level visual features. A four-state hidden Markov model was trained to simulate transition processes among these shot classes. Since attack structures are the longest repetitive temporal unit in a sports video, a sufļ¬x tree is proposed to ļ¬nd the longest repetitive substring in the label sequence of shot class transitions. These occurrences of this substring are regarded as a kernel of an attack hidden Markov process. Therefore, the decomposition of attack structure becomes a boundary likelihood comparison between two Markov chains.
Highlights are what attract notice. Attention is a psychological measurement of ānotice ā. A brief survey of attention psychological background, attention estimation from vision and auditory, and multiple modality attention fusion is presented. We propose two attention models for sports video analysis, namely, the role-based attention model and the multiresolution autoregressive framework. The role-based attention model is based on the perception structure during watching video. This model removes reļ¬ection bias among modality salient signals and combines these signals by reļ¬ectors. The multiresolution autoregressive framework (MAR) treats salient signals as a group of smooth random processes, which follow a similar trend but are ļ¬lled with noise. This framework tries to estimate a noise-less signal from these coarse noisy observations by a multiple resolution analysis. Related algorithms are developed, such as event segmentation on a MAR tree and real time event detection. The experiment shows that these attention-based approach can ļ¬nd goal events at a high precision. Moreover, results of MAR-based highlight detection on the ļ¬nal game of FIFA 2002 and 2006 are highly similar to professionally labelled highlights by BBC and FIFA
Ball Trajectory Inference from Multi-Agent Sports Contexts Using Set Transformer and Hierarchical Bi-LSTM
As artificial intelligence spreads out to numerous fields, the application of
AI to sports analytics is also in the spotlight. However, one of the major
challenges is the difficulty of automated acquisition of continuous movement
data during sports matches. In particular, it is a conundrum to reliably track
a tiny ball on a wide soccer pitch with obstacles such as occlusion and
imitations. Tackling the problem, this paper proposes an inference framework of
ball trajectory from player trajectories as a cost-efficient alternative to
ball tracking. We combine Set Transformers to get permutation-invariant and
equivariant representations of the multi-agent contexts with a hierarchical
architecture that intermediately predicts the player ball possession to support
the final trajectory inference. Also, we introduce the reality loss term and
postprocessing to secure the estimated trajectories to be physically realistic.
The experimental results show that our model provides natural and accurate
trajectories as well as admissible player ball possession at the same time.
Lastly, we suggest several practical applications of our framework including
missing trajectory imputation, semi-automated pass annotation, automated
zoom-in for match broadcasting, and calculating possession-wise running
performance metrics
Audiovisual processing for sports-video summarisation technology
In this thesis a novel audiovisual feature-based scheme is proposed for the automatic summarization of sports-video content The scope of operability of the scheme is designed to encompass the wide variety o f sports genres that come under the description āfield-sportsā. Given the assumption that, in terms of conveying the narrative of a field-sports-video, score-update events constitute the most significant moments, it is proposed that their detection should thus yield a favourable summarisation solution. To this end, a generic methodology is proposed for the automatic identification of score-update events in field-sports-video content. The scheme is based on the development of robust extractors for a set of critical features, which are shown to reliably indicate their locations. The evidence gathered by the feature extractors is combined and analysed using a Support Vector Machine (SVM), which performs the event detection process. An SVM is chosen on the basis that its underlying technology represents an implementation of the latest generation of machine learning algorithms, based on the recent advances in statistical learning. Effectively, an SVM offers a solution to optimising the classification performance of a decision hypothesis, inferred from a given set of training data. Via a learning phase that utilizes a 90-hour field-sports-video trainmg-corpus, the SVM infers a score-update event model by observing patterns in the extracted feature evidence. Using a similar but distinct 90-hour evaluation corpus, the effectiveness of this model is then tested genencally across multiple genres of fieldsports- video including soccer, rugby, field hockey, hurling, and Gaelic football. The results suggest that in terms o f the summarization task, both high event retrieval and content rejection statistics are achievable
Anomaly Detection, Rule Adaptation and Rule Induction Methodologies in the Context of Automated Sports Video Annotation.
Automated video annotation is a topic of considerable interest in computer vision due to its applications in video search, object based video encoding and enhanced broadcast content. The domain of sport broadcasting is, in particular, the subject of current research attention due to its fixed, rule governed, content. This research work aims to develop, analyze and demonstrate novel methodologies that can be useful in the context of adaptive and automated video annotation systems. In this thesis, we present methodologies for addressing the problems of anomaly detection, rule adaptation and rule induction for court based sports such as tennis and badminton. We first introduce an HMM induction strategy for a court-model based method that uses the court structure in the form of a lattice for two related modalities of singles and doubles tennis to tackle the problems of anomaly detection and rectification. We also introduce another anomaly detection methodology that is based on the disparity between the low-level vision based classifiers and the high-level contextual classifier. Another approach to address the problem of rule adaptation is also proposed that employs Convex hulling of the anomalous states. We also investigate a number of novel hierarchical HMM generating methods for stochastic induction of game rules. These methodologies include, Cartesian product Label-based Hierarchical Bottom-up Clustering (CLHBC) that employs prior information within the label structures. A new constrained variant of the classical Chinese Restaurant Process (CRP) is also introduced that is relevant to sports games. We also propose two hybrid methodologies in this context and a comparative analysis is made against the flat Markov model. We also show that these methods are also generalizable to other rule based environments
Towards Efficient Ice Surface Localization From Hockey Broadcast Video
Using computer vision-based technology in ice hockey has recently been embraced as it allows for the automatic collection of analytics. This data would be too expensive and time-consuming to otherwise collect manually. The insights gained from these analytics allow for a more in-depth understanding of the game, which can influence coaching and management decisions. A fundamental component of automatically deriving analytics from hockey broadcast video is ice rink localization. In broadcast video of hockey games, the camera pans, tilts, and zooms to follow the play. To compensate for this motion and get the absolute locations of the players and puck on the ice, an ice rink localization pipeline must find the perspective transform that maps each frame to an overhead view of the rink.
The lack of publicly available datasets makes it difficult to perform research into ice rink localization. A novel annotation tool and dataset are presented, which includes 7,721 frames from National Hockey League game broadcasts.
Since ice rink localization is a component of a full hockey analytics pipeline, it is important that these methods be as efficient as possible to reduce the run time. Small neural networks that reduce inference time while maintaining high accuracy can be used as an intermediate step to perform ice rink localization by segmenting the lines from the playing surface.
Ice rink localization methods tend to infer the camera calibration of each frame in a broadcast sequence individually. This results in perturbations in the output of the pipeline, as there is no consideration of the camera calibrations of the frames before and after in the sequence. One way to reduce the noise in the output is to add a post-processing step after the ice has been localized to smooth the camera parameters and closely simulate the cameraās motion. Several methods for extracting the pan, tilt, and zoom from the perspective transform matrix are explored. The camera parameters obtained from the inferred perspective transform can be smoothed to give a visually coherent video output. Deep neural networks have allowed for the development of architectures that can perform several tasks at once. A basis for networks that can regress the ice rink localization parameters and simultaneously smooth them is presented.
This research provides several approaches for improving ice rink localization methods. Specifically, the analytics pipelines can become faster and provide better results visually. This can allow for improved insight into hockey games, which can increase the performance of the hockey team with reduced cost
- ā¦