11,192 research outputs found
Semantic analysis of field sports video using a petri-net of audio-visual concepts
The most common approach to automatic summarisation and highlight detection in sports video is to train an automatic classifier to detect semantic highlights based on occurrences of low-level features such as action replays, excited commentators or changes in a scoreboard. We propose an alternative approach based on the detection of perception concepts (PCs) and the construction of Petri-Nets which can be used for both semantic description and event detection within sports videos. Low-level algorithms for the detection of perception concepts using visual, aural and motion characteristics are proposed, and a series of Petri-Nets composed of perception concepts is formally defined to describe video content. We call this a Perception Concept Network-Petri Net (PCN-PN) model. Using PCN-PNs, personalized high-level semantic descriptions of video highlights can be facilitated and queries on high-level semantics can be achieved. A particular strength of this framework is that we can easily build semantic detectors based on PCN-PNs to search within sports videos and locate interesting events. Experimental results based on recorded sports
video data across three types of sports games (soccer, basketball and rugby), and each from multiple broadcasters, are used to illustrate the potential of this framework
Video summarisation: A conceptual framework and survey of the state of the art
This is the post-print (final draft post-refereeing) version of the article. Copyright @ 2007 Elsevier Inc.Video summaries provide condensed and succinct representations of the content of a video stream through a combination of still images, video segments, graphical representations and textual descriptors. This paper presents a conceptual framework for video summarisation derived from the research literature and used as a means for surveying the research literature. The framework distinguishes between video summarisation techniques (the methods used to process content from a source video stream to achieve a summarisation of that stream) and video summaries (outputs of video summarisation techniques). Video summarisation techniques are considered within three broad categories: internal (analyse information sourced directly from the video stream), external (analyse information not sourced directly from the video stream) and hybrid (analyse a combination of internal and external information). Video summaries are considered as a function of the type of content they are derived from (object, event, perception or feature based) and the functionality offered to the user for their consumption (interactive or static, personalised or generic). It is argued that video summarisation would benefit from greater incorporation of external information, particularly user based information that is unobtrusively sourced, in order to overcome longstanding challenges such as the semantic gap and providing video summaries that have greater relevance to individual users
Content-based Video Retrieval by Integrating Spatio-Temporal and Stochastic Recognition of Events
As amounts of publicly available video data grow the need to query this data efficiently becomes significant. Consequently content-based retrieval of video data turns out to be a challenging and important problem. We address the specific aspect of inferring semantics automatically from raw video data. In particular, we introduce a new video data model that supports the integrated use of two different approaches for mapping low-level features to high-level concepts. Firstly, the model is extended with a rule-based approach that supports spatio-temporal formalization of high-level concepts, and then with a stochastic approach. Furthermore, results on real tennis video data are presented, demonstrating the validity of both approaches, as well us advantages of their integrated us
Deceptive body movements reverse spatial cueing in soccer
This article has been made available through the Brunel Open Access Publishing Fund.The purpose of the experiments was to analyse the spatial cueing effects of the movements of soccer players executing normal and deceptive (step-over) turns with the ball. Stimuli comprised normal resolution or point-light video clips of soccer players dribbling a football towards the observer then turning right or left with the ball. Clips were curtailed before or on the turn (-160, -80, 0 or +80 ms) to examine the time course of direction prediction and spatial cueing effects. Participants were divided into higher-skilled (HS) and lower-skilled (LS) groups according to soccer experience. In experiment 1, accuracy on full video clips was higher than on point-light but results followed the same overall pattern. Both HS and LS groups correctly identified direction on normal moves at all occlusion levels. For deceptive moves, LS participants were significantly worse than chance and HS participants were somewhat more accurate but nevertheless substantially impaired. In experiment 2, point-light clips were used to cue a lateral target. HS and LS groups showed faster reaction times to targets that were congruent with the direction of normal turns, and to targets incongruent with the direction of deceptive turns. The reversed cueing by deceptive moves coincided with earlier kinematic events than cueing by normal moves. It is concluded that the body kinematics of soccer players generate spatial cueing effects when viewed from an opponent's perspective. This could create a reaction time advantage when anticipating the direction of a normal move. A deceptive move is designed to turn this cueing advantage into a disadvantage. Acting on the basis of advance information, the presence of deceptive moves primes responses in the wrong direction, which may be only partly mitigated by delaying a response until veridical cues emerge
SoccerNet-Tracking: Multiple Object Tracking Dataset and Benchmark in Soccer Videos
Tracking objects in soccer videos is extremely important to gather both
player and team statistics, whether it is to estimate the total distance run,
the ball possession or the team formation. Video processing can help automating
the extraction of those information, without the need of any invasive sensor,
hence applicable to any team on any stadium. Yet, the availability of datasets
to train learnable models and benchmarks to evaluate methods on a common
testbed is very limited. In this work, we propose a novel dataset for multiple
object tracking composed of 200 sequences of 30s each, representative of
challenging soccer scenarios, and a complete 45-minutes half-time for long-term
tracking. The dataset is fully annotated with bounding boxes and tracklet IDs,
enabling the training of MOT baselines in the soccer domain and a full
benchmarking of those methods on our segregated challenge sets. Our analysis
shows that multiple player, referee and ball tracking in soccer videos is far
from being solved, with several improvement required in case of fast motion or
in scenarios of severe occlusion.Comment: Paper accepted for the CVsports workshop at CVPR2022. This document
contains 8 pages + reference
- …