5,375 research outputs found
Large-Scale Mapping of Human Activity using Geo-Tagged Videos
This paper is the first work to perform spatio-temporal mapping of human
activity using the visual content of geo-tagged videos. We utilize a recent
deep-learning based video analysis framework, termed hidden two-stream
networks, to recognize a range of activities in YouTube videos. This framework
is efficient and can run in real time or faster which is important for
recognizing events as they occur in streaming video or for reducing latency in
analyzing already captured video. This is, in turn, important for using video
in smart-city applications. We perform a series of experiments to show our
approach is able to accurately map activities both spatially and temporally. We
also demonstrate the advantages of using the visual content over the
tags/titles.Comment: Accepted at ACM SIGSPATIAL 201
SoccerNet: A Scalable Dataset for Action Spotting in Soccer Videos
In this paper, we introduce SoccerNet, a benchmark for action spotting in
soccer videos. The dataset is composed of 500 complete soccer games from six
main European leagues, covering three seasons from 2014 to 2017 and a total
duration of 764 hours. A total of 6,637 temporal annotations are automatically
parsed from online match reports at a one minute resolution for three main
classes of events (Goal, Yellow/Red Card, and Substitution). As such, the
dataset is easily scalable. These annotations are manually refined to a one
second resolution by anchoring them at a single timestamp following
well-defined soccer rules. With an average of one event every 6.9 minutes, this
dataset focuses on the problem of localizing very sparse events within long
videos. We define the task of spotting as finding the anchors of soccer events
in a video. Making use of recent developments in the realm of generic action
recognition and detection in video, we provide strong baselines for detecting
soccer events. We show that our best model for classifying temporal segments of
length one minute reaches a mean Average Precision (mAP) of 67.8%. For the
spotting task, our baseline reaches an Average-mAP of 49.7% for tolerances
ranging from 5 to 60 seconds. Our dataset and models are available at
https://silviogiancola.github.io/SoccerNet.Comment: CVPR Workshop on Computer Vision in Sports 201
Semantic analysis of field sports video using a petri-net of audio-visual concepts
The most common approach to automatic summarisation and highlight detection in sports video is to train an automatic classifier to detect semantic highlights based on occurrences of low-level features such as action replays, excited commentators or changes in a scoreboard. We propose an alternative approach based on the detection of perception concepts (PCs) and the construction of Petri-Nets which can be used for both semantic description and event detection within sports videos. Low-level algorithms for the detection of perception concepts using visual, aural and motion characteristics are proposed, and a series of Petri-Nets composed of perception concepts is formally defined to describe video content. We call this a Perception Concept Network-Petri Net (PCN-PN) model. Using PCN-PNs, personalized high-level semantic descriptions of video highlights can be facilitated and queries on high-level semantics can be achieved. A particular strength of this framework is that we can easily build semantic detectors based on PCN-PNs to search within sports videos and locate interesting events. Experimental results based on recorded sports
video data across three types of sports games (soccer, basketball and rugby), and each from multiple broadcasters, are used to illustrate the potential of this framework
Video semantic content analysis based on ontology
The rapid increase in the available amount of video data is creating a growing demand for efficient methods for understanding and managing it at the semantic level. New multimedia standards, such as MPEG-4 and MPEG-7, provide the basic functionalities in order to manipulate and transmit objects and metadata. But importantly, most of the content of video data at a semantic level is out of the scope of the standards. In this paper, a video semantic content analysis framework based on ontology is presented. Domain ontology is used to define high level semantic concepts and their relations in the context of the examined domain. And low-level features (e.g. visual and aural) and video content analysis algorithms are integrated into the ontology to enrich video semantic analysis. OWL is used for the ontology description. Rules in Description Logic are defined to describe how features and algorithms for video analysis should be applied according to different perception content and low-level features. Temporal Description Logic is used to describe the semantic events, and a reasoning algorithm is proposed for events detection. The proposed framework is demonstrated in a soccer video domain and shows promising results
Multi-camera analysis of soccer sequences
The automatic detection of meaningful phases in a soccer game depends on the accurate localization of players and the ball at each moment. However, the automatic analysis of soccer sequences is a challenging task due to the presence of fast moving multiple objects. For this purpose, we present a multi-camera analysis system that yields the position of the ball and players on a common ground plane. The detection in each camera is based on a code-book algorithm and different features are used to classify the detected blobs. The detection results of each camera are transformed using homography to a virtual top-view of the playing field. Within this virtual top-view we merge trajectory information of the different cameras allowing to refine the found positions. In this paper, we evaluate the system on a public SOCCER dataset and end with a discussion of possible improvements of the dataset
A Fuzzy Logic-Based System for Soccer Video Scenes Classification
Massive global video surveillance worldwide captures data but lacks detailed activity information to flag events of interest, while the human burden of monitoring video footage is untenable. Artificial intelligence (AI) can be applied to raw video footage to identify and extract required information and summarize it in linguistic formats. Video summarization automation usually involves text-based data such as subtitles, segmenting text and semantics, with little attention to video summarization in the processing of video footage only. Classification problems in recorded videos are often very complex and uncertain due to the dynamic nature of the video sequence and light conditions, background, camera angle, occlusions, indistinguishable scene features, etc.
Video scene classification forms the basis of linguistic video summarization, an open research problem with major commercial importance. Soccer video scenes present added challenges due to specific objects and events with similar features (e.g. âpeopleâ include audiences, coaches, and players), as well as being constituted from a series of quickly changing and dynamic frames with small inter-frame variations. There is an added difficulty associated with the need to have light weight video classification systems working in real time with massive data sizes.
In this thesis, we introduce a novel system based on Interval Type-2 Fuzzy Logic Classification Systems (IT2FLCS) whose parameters are optimized by the Big BangâBig Crunch (BB-BC) algorithm, which allows for the automatic scenes classification using optimized rules in broadcasted soccer matches video. The type-2 fuzzy logic systems would be unequivocal to present a highly interpretable and transparent model which is very suitable for the handling the encountered uncertainties in video footages and converting the accumulated data to linguistic formats which can be easily stored and analysed. Meanwhile the traditional black box techniques, such as support vector machines (SVMs) and neural networks, do not provide models which could be easily analysed and understood by human users. The BB-BC optimization is a heuristic, population-based evolutionary approach which is characterized by the ease of implementation, fast convergence and low computational cost. We employed the BB-BC to optimize our system parameters of fuzzy logic membership functions and fuzzy rules. Using the BB-BC we are able to balance the system transparency (through generating a small rule set) together with increasing the accuracy of scene classification. Thus, the proposed fuzzy-based system allows achieving relatively high classification accuracy with a small number of rules thus increasing the system interpretability and allowing its real-time processing. The type-2 Fuzzy Logic Classification System (T2FLCS) obtained 87.57% prediction accuracy in the scene classification of our testing group data which is better than the type-1 fuzzy classification system and neural networks counterparts. The BB-BC optimization algorithms decrease the size of rule bases both in T1FLCS and T2FLCS; the T2FLCS finally got 85.716% with reduce rules, outperforming the T1FLCS and neural network counterparts, especially in the âout-of-range dataâ which validates the T2FLCSs capability to handle the high level of faced uncertainties.
We also presented a novel approach based on the scenes classification system combined with the dynamic time warping algorithm to implement the video events detection for real world processing. The proposed system could run on recorded or live video clips and output a label to describe the event in order to provide the high level summarization of the videos to the user
Real-time event detection in field sport videos
This chapter describes a real-time system for event detection in sports broadcasts. The approach presented is applicable to a wide range of field sports. Using two independent event detection approaches that work simultaneously, the system is capable of accurately detecting scores, near misses, and other exciting parts of a game that do not result in a score. The results obtained across a diverse dataset of different field sports are promising, demonstrating over 90% accuracy for a feature-based event detector and 100% accuracy for a scoreboard-based detector detecting only score
- âŠ