338 research outputs found

    SoccerNet: A Scalable Dataset for Action Spotting in Soccer Videos

    Full text link
    In this paper, we introduce SoccerNet, a benchmark for action spotting in soccer videos. The dataset is composed of 500 complete soccer games from six main European leagues, covering three seasons from 2014 to 2017 and a total duration of 764 hours. A total of 6,637 temporal annotations are automatically parsed from online match reports at a one minute resolution for three main classes of events (Goal, Yellow/Red Card, and Substitution). As such, the dataset is easily scalable. These annotations are manually refined to a one second resolution by anchoring them at a single timestamp following well-defined soccer rules. With an average of one event every 6.9 minutes, this dataset focuses on the problem of localizing very sparse events within long videos. We define the task of spotting as finding the anchors of soccer events in a video. Making use of recent developments in the realm of generic action recognition and detection in video, we provide strong baselines for detecting soccer events. We show that our best model for classifying temporal segments of length one minute reaches a mean Average Precision (mAP) of 67.8%. For the spotting task, our baseline reaches an Average-mAP of 49.7% for tolerances Ī“\delta ranging from 5 to 60 seconds. Our dataset and models are available at https://silviogiancola.github.io/SoccerNet.Comment: CVPR Workshop on Computer Vision in Sports 201

    Audio-visual football video analysis, from structure detection to attention analysis

    Get PDF
    Sport video is an important video genre. Content-based sports video analysis attracts great interest from both industry and academic ļ¬elds. A sports video is characterised by repetitive temporal structures, relatively plain contents, and strong spatio-temporal variations, such as quick camera switches and swift local motions. It is necessary to develop speciļ¬c techniques for content-based sports video analysis to utilise these characteristics. For an efļ¬cient and effective sports video analysis system, there are three fundamental questions: (1) what are key stories for sports videos; (2) what incurs viewerā€™s interest; and (3) how to identify game highlights. This thesis is developed around these questions. We approached these questions from two different perspectives and in turn three research contributions are presented, namely, replay detection, attack temporal structure decomposition, and attention-based highlight identiļ¬cation. Replay segments convey the most important contents in sports videos. It is an efļ¬cient approach to collect game highlights by detecting replay segments. However, replay is an artefact of editing, which improves with advances in video editing tools. The composition of replay is complex, which includes logo transitions, slow motions, viewpoint switches and normal speed video clips. Since logo transition clips are pervasive in game collections of FIFA World Cup 2002, FIFA World Cup 2006 and UEFA Championship 2006, we take logo transition detection as an effective replacement of replay detection. A two-pass system was developed, including a ļ¬ve-layer adaboost classiļ¬er and a logo template matching throughout an entire video. The ļ¬ve-layer adaboost utilises shot duration, average game pitch ratio, average motion, sequential colour histogram and shot frequency between two neighbouring logo transitions, to ļ¬lter out logo transition candidates. Subsequently, a logo template is constructed and employed to ļ¬nd all transition logo sequences. The precision and recall of this system in replay detection is 100% in a ļ¬ve-game evaluation collection. An attack structure is a team competition for a score. Hence, this structure is a conceptually fundamental unit of a football video as well as other sports videos. We review the literature of content-based temporal structures, such as play-break structure, and develop a three-step system for automatic attack structure decomposition. Four content-based shot classes, namely, play, focus, replay and break were identiļ¬ed by low level visual features. A four-state hidden Markov model was trained to simulate transition processes among these shot classes. Since attack structures are the longest repetitive temporal unit in a sports video, a sufļ¬x tree is proposed to ļ¬nd the longest repetitive substring in the label sequence of shot class transitions. These occurrences of this substring are regarded as a kernel of an attack hidden Markov process. Therefore, the decomposition of attack structure becomes a boundary likelihood comparison between two Markov chains. Highlights are what attract notice. Attention is a psychological measurement of ā€œnotice ā€. A brief survey of attention psychological background, attention estimation from vision and auditory, and multiple modality attention fusion is presented. We propose two attention models for sports video analysis, namely, the role-based attention model and the multiresolution autoregressive framework. The role-based attention model is based on the perception structure during watching video. This model removes reļ¬‚ection bias among modality salient signals and combines these signals by reļ¬‚ectors. The multiresolution autoregressive framework (MAR) treats salient signals as a group of smooth random processes, which follow a similar trend but are ļ¬lled with noise. This framework tries to estimate a noise-less signal from these coarse noisy observations by a multiple resolution analysis. Related algorithms are developed, such as event segmentation on a MAR tree and real time event detection. The experiment shows that these attention-based approach can ļ¬nd goal events at a high precision. Moreover, results of MAR-based highlight detection on the ļ¬nal game of FIFA 2002 and 2006 are highly similar to professionally labelled highlights by BBC and FIFA

    Occupancy Analysis of the Outdoor Football Fields

    Get PDF

    Ball Trajectory Inference from Multi-Agent Sports Contexts Using Set Transformer and Hierarchical Bi-LSTM

    Full text link
    As artificial intelligence spreads out to numerous fields, the application of AI to sports analytics is also in the spotlight. However, one of the major challenges is the difficulty of automated acquisition of continuous movement data during sports matches. In particular, it is a conundrum to reliably track a tiny ball on a wide soccer pitch with obstacles such as occlusion and imitations. Tackling the problem, this paper proposes an inference framework of ball trajectory from player trajectories as a cost-efficient alternative to ball tracking. We combine Set Transformers to get permutation-invariant and equivariant representations of the multi-agent contexts with a hierarchical architecture that intermediately predicts the player ball possession to support the final trajectory inference. Also, we introduce the reality loss term and postprocessing to secure the estimated trajectories to be physically realistic. The experimental results show that our model provides natural and accurate trajectories as well as admissible player ball possession at the same time. Lastly, we suggest several practical applications of our framework including missing trajectory imputation, semi-automated pass annotation, automated zoom-in for match broadcasting, and calculating possession-wise running performance metrics

    Audiovisual processing for sports-video summarisation technology

    Get PDF
    In this thesis a novel audiovisual feature-based scheme is proposed for the automatic summarization of sports-video content The scope of operability of the scheme is designed to encompass the wide variety o f sports genres that come under the description ā€˜field-sportsā€™. Given the assumption that, in terms of conveying the narrative of a field-sports-video, score-update events constitute the most significant moments, it is proposed that their detection should thus yield a favourable summarisation solution. To this end, a generic methodology is proposed for the automatic identification of score-update events in field-sports-video content. The scheme is based on the development of robust extractors for a set of critical features, which are shown to reliably indicate their locations. The evidence gathered by the feature extractors is combined and analysed using a Support Vector Machine (SVM), which performs the event detection process. An SVM is chosen on the basis that its underlying technology represents an implementation of the latest generation of machine learning algorithms, based on the recent advances in statistical learning. Effectively, an SVM offers a solution to optimising the classification performance of a decision hypothesis, inferred from a given set of training data. Via a learning phase that utilizes a 90-hour field-sports-video trainmg-corpus, the SVM infers a score-update event model by observing patterns in the extracted feature evidence. Using a similar but distinct 90-hour evaluation corpus, the effectiveness of this model is then tested genencally across multiple genres of fieldsports- video including soccer, rugby, field hockey, hurling, and Gaelic football. The results suggest that in terms o f the summarization task, both high event retrieval and content rejection statistics are achievable

    Anomaly Detection, Rule Adaptation and Rule Induction Methodologies in the Context of Automated Sports Video Annotation.

    Get PDF
    Automated video annotation is a topic of considerable interest in computer vision due to its applications in video search, object based video encoding and enhanced broadcast content. The domain of sport broadcasting is, in particular, the subject of current research attention due to its fixed, rule governed, content. This research work aims to develop, analyze and demonstrate novel methodologies that can be useful in the context of adaptive and automated video annotation systems. In this thesis, we present methodologies for addressing the problems of anomaly detection, rule adaptation and rule induction for court based sports such as tennis and badminton. We first introduce an HMM induction strategy for a court-model based method that uses the court structure in the form of a lattice for two related modalities of singles and doubles tennis to tackle the problems of anomaly detection and rectification. We also introduce another anomaly detection methodology that is based on the disparity between the low-level vision based classifiers and the high-level contextual classifier. Another approach to address the problem of rule adaptation is also proposed that employs Convex hulling of the anomalous states. We also investigate a number of novel hierarchical HMM generating methods for stochastic induction of game rules. These methodologies include, Cartesian product Label-based Hierarchical Bottom-up Clustering (CLHBC) that employs prior information within the label structures. A new constrained variant of the classical Chinese Restaurant Process (CRP) is also introduced that is relevant to sports games. We also propose two hybrid methodologies in this context and a comparative analysis is made against the flat Markov model. We also show that these methods are also generalizable to other rule based environments

    Towards Efficient Ice Surface Localization From Hockey Broadcast Video

    Get PDF
    Using computer vision-based technology in ice hockey has recently been embraced as it allows for the automatic collection of analytics. This data would be too expensive and time-consuming to otherwise collect manually. The insights gained from these analytics allow for a more in-depth understanding of the game, which can influence coaching and management decisions. A fundamental component of automatically deriving analytics from hockey broadcast video is ice rink localization. In broadcast video of hockey games, the camera pans, tilts, and zooms to follow the play. To compensate for this motion and get the absolute locations of the players and puck on the ice, an ice rink localization pipeline must find the perspective transform that maps each frame to an overhead view of the rink. The lack of publicly available datasets makes it difficult to perform research into ice rink localization. A novel annotation tool and dataset are presented, which includes 7,721 frames from National Hockey League game broadcasts. Since ice rink localization is a component of a full hockey analytics pipeline, it is important that these methods be as efficient as possible to reduce the run time. Small neural networks that reduce inference time while maintaining high accuracy can be used as an intermediate step to perform ice rink localization by segmenting the lines from the playing surface. Ice rink localization methods tend to infer the camera calibration of each frame in a broadcast sequence individually. This results in perturbations in the output of the pipeline, as there is no consideration of the camera calibrations of the frames before and after in the sequence. One way to reduce the noise in the output is to add a post-processing step after the ice has been localized to smooth the camera parameters and closely simulate the cameraā€™s motion. Several methods for extracting the pan, tilt, and zoom from the perspective transform matrix are explored. The camera parameters obtained from the inferred perspective transform can be smoothed to give a visually coherent video output. Deep neural networks have allowed for the development of architectures that can perform several tasks at once. A basis for networks that can regress the ice rink localization parameters and simultaneously smooth them is presented. This research provides several approaches for improving ice rink localization methods. Specifically, the analytics pipelines can become faster and provide better results visually. This can allow for improved insight into hockey games, which can increase the performance of the hockey team with reduced cost
    • ā€¦
    corecore