2,032 research outputs found
Semantic analysis of field sports video using a petri-net of audio-visual concepts
The most common approach to automatic summarisation and highlight detection in sports video is to train an automatic classifier to detect semantic highlights based on occurrences of low-level features such as action replays, excited commentators or changes in a scoreboard. We propose an alternative approach based on the detection of perception concepts (PCs) and the construction of Petri-Nets which can be used for both semantic description and event detection within sports videos. Low-level algorithms for the detection of perception concepts using visual, aural and motion characteristics are proposed, and a series of Petri-Nets composed of perception concepts is formally defined to describe video content. We call this a Perception Concept Network-Petri Net (PCN-PN) model. Using PCN-PNs, personalized high-level semantic descriptions of video highlights can be facilitated and queries on high-level semantics can be achieved. A particular strength of this framework is that we can easily build semantic detectors based on PCN-PNs to search within sports videos and locate interesting events. Experimental results based on recorded sports
video data across three types of sports games (soccer, basketball and rugby), and each from multiple broadcasters, are used to illustrate the potential of this framework
Event detection in field sports video using audio-visual features and a support vector machine
In this paper, we propose a novel audio-visual feature-based framework for event detection in broadcast video of multiple different field sports. Features indicating significant events are selected and robust detectors built. These features are rooted in characteristics common to all genres of field sports. The evidence gathered by the feature detectors is combined by means of a support vector machine, which infers the occurrence of an event based on a model generated during a training phase. The system is tested generically across multiple genres of field sports including soccer, rugby, hockey, and Gaelic football and the results suggest that high event retrieval and content rejection statistics are achievable
An Overview of Multimodal Techniques for the Characterization of Sport Programmes
The problem of content characterization of sports videos is of great interest because sports video appeals to large audiences and its efficient distribution over various networks should contribute to widespread usage of multimedia services. In this paper we analyze several techniques proposed in literature for content characterization of sports videos. We focus this analysis on the typology of the signal (audio, video, text captions, ...) from which the low-level features are extracted. First we consider the techniques based on visual information, then the methods based on audio information, and finally the algorithms based on audio-visual cues, used in a multi-modal fashion. This analysis shows that each type of signal carries some peculiar information, and the multi-modal approach can fully exploit the multimedia information associated to the sports video. Moreover, we observe that the characterization is performed either considering what happens in a specific time segment, observing therefore the features in a "static" way, or trying to capture their "dynamic" evolution in time. The effectiveness of each approach depends mainly on the kind of sports it relates to, and the type of highlights we are focusing on
Event detection based on generic characteristics of field-sports
In this paper, we propose a generic framework for event detection in broadcast video of multiple different field-sports. Features indicating significant events are selected, and robust detectors built. These features are rooted in generic characteristics common to all genres of field-sports. The evidence gathered by the feature detectors is combined by means of a support vector machine, which infers the occurrence of an event based on a model generated during a training phase. The system is tested across multiple genres of field-sports including soccer, rugby, hockey and Gaelic football and the results suggest that high event retrieval and content rejection statistics are achievable
Automated classification of cricket pitch frames in cricket video
The automated detection of the cricket pitch in a video recording of a cricket match is a fundamental step in content-based indexing and summarization of cricket videos. In this paper, we propose visualcontent based algorithms to automate the extraction of video frames with the cricket pitch in focus. As a preprocessing step, we first select a subset of frames with a view of the cricket field, of which the cricket pitch forms a part. This filtering process reduces the search space by eliminating frames that contain a view of the audience, close-up shots of specific players, advertisements, etc. The subset of frames containing the cricket field is then subject to statistical modeling of the grayscale (brightness) histogram (SMoG). Since SMoG does not utilize color or domain-specific information such as the region in the frame where the pitch is expected to be located, we propose an alternative algorithm: component quantization based region of interest extraction (CQRE) for the extraction of pitch frames. Experimental results demonstrate that, regardless of the quality of the input, successive application of the two methods outperforms either one applied exclusively. The SMoG-CQRE combination for pitch frame classification yields an average accuracy of 98:6% in the best case (a high resolution video with good contrast) and an average accuracy of 87:9% in the worst case (a low resolution video with poor contrast). Since, the extraction of pitch frames forms the first step in analyzing the important events in a match, we also present a post-processing step, viz. , an algorithm to detect players in the extracted pitch frames
Leveraging Contextual Cues for Generating Basketball Highlights
The massive growth of sports videos has resulted in a need for automatic
generation of sports highlights that are comparable in quality to the
hand-edited highlights produced by broadcasters such as ESPN. Unlike previous
works that mostly use audio-visual cues derived from the video, we propose an
approach that additionally leverages contextual cues derived from the
environment that the game is being played in. The contextual cues provide
information about the excitement levels in the game, which can be ranked and
selected to automatically produce high-quality basketball highlights. We
introduce a new dataset of 25 NCAA games along with their play-by-play stats
and the ground-truth excitement data for each basket. We explore the
informativeness of five different cues derived from the video and from the
environment through user studies. Our experiments show that for our study
participants, the highlights produced by our system are comparable to the ones
produced by ESPN for the same games.Comment: Proceedings of ACM Multimedia 201
Multimodal framework based on audio‐visual features for summarisation of cricket videos
Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/166171/1/ipr2bf02094.pd
- …