551 research outputs found
Semantic analysis of field sports video using a petri-net of audio-visual concepts
The most common approach to automatic summarisation and highlight detection in sports video is to train an automatic classifier to detect semantic highlights based on occurrences of low-level features such as action replays, excited commentators or changes in a scoreboard. We propose an alternative approach based on the detection of perception concepts (PCs) and the construction of Petri-Nets which can be used for both semantic description and event detection within sports videos. Low-level algorithms for the detection of perception concepts using visual, aural and motion characteristics are proposed, and a series of Petri-Nets composed of perception concepts is formally defined to describe video content. We call this a Perception Concept Network-Petri Net (PCN-PN) model. Using PCN-PNs, personalized high-level semantic descriptions of video highlights can be facilitated and queries on high-level semantics can be achieved. A particular strength of this framework is that we can easily build semantic detectors based on PCN-PNs to search within sports videos and locate interesting events. Experimental results based on recorded sports
video data across three types of sports games (soccer, basketball and rugby), and each from multiple broadcasters, are used to illustrate the potential of this framework
Data-Driven Analytics for Decision Making in Game Sports
Performance analysis and good decision making in sports is important to maximize chances of winning. Over the last years the amount and quality of data which is available for the analysis has increased enormously due to technical developments like, e.g., of sensor technologies or computer vision technology. However, the data-driven analysis of athletes and team performances is very demanding. One reason is the so called semantic gap of sports analytics. This means that the concepts of coaches are seldomly represented in the data for the analysis. Furthermore, sports in general and game sports in particular present a huge challenge due to its dynamic characteristics and the multi-factorial influences on an athlete’s performance like, e.g., the numerous interaction processes during a match. This requires different types of analyses like, e.g., qualitative analyses and thus anecdotal descriptions of performances up to quantitative analyses with which performances can be described through statistics and indicators. Additionally, coaches and analysts have to work under an enormous time pressure and decisions have to be made very quickly.
In order to facilitate the demanding task of game sports analysts and coaches we present a generic approach how to conceptualize and design a Data Analytics System (DAS) for an efficient support of the decision making processes in practice. We first introduce a theoretical model and present a way how to bridge the semantic gap of sports analytics. This ensures that DASs will provide relevant information for the decision makers. Moreover, we show that DASs need to combine qualitative and quantitative analyses as well as visualizations. Additionally, we introduce different query types which are required for a holistic retrieval of sports data. We furthermore show a model for the user-centered planning and designing of the User Experience (UX) of a DAS.
Having introduced the theoretical basis we present SportSense, a DAS to support decision making in game sports. Its generic architecture allows a fast adaptation to the individual characteristics and requirements of different game sports. SportSense is novel with respect to the fact that it unites raw data, event data, and video data. Furthermore, it supports different query types including an intuitive sketch-based retrieval and seamlessly combines qualitative and quantitative analyses as well as several data visualization options. Moreover, we present the two applications SportSense Football and SportSense Ice Hockey which contain sport-specific concepts and cover (high-level) tactical analyses
Surveillance centric coding
PhDThe research work presented in this thesis focuses on the development of techniques
specific to surveillance videos for efficient video compression with higher processing
speed. The Scalable Video Coding (SVC) techniques are explored to achieve higher
compression efficiency. The framework of SVC is modified to support Surveillance
Centric Coding (SCC). Motion estimation techniques specific to surveillance videos
are proposed in order to speed up the compression process of the SCC.
The main contributions of the research work presented in this thesis are divided into
two groups (i) Efficient Compression and (ii) Efficient Motion Estimation. The
paradigm of Surveillance Centric Coding (SCC) is introduced, in which coding aims
to achieve bit-rate optimisation and adaptation of surveillance videos for storing and
transmission purposes. In the proposed approach the SCC encoder communicates
with the Video Content Analysis (VCA) module that detects events of interest in
video captured by the CCTV. Bit-rate optimisation and adaptation are achieved by
exploiting the scalability properties of the employed codec. Time segments
containing events relevant to surveillance application are encoded using high spatiotemporal
resolution and quality while the irrelevant portions from the surveillance
standpoint are encoded at low spatio-temporal resolution and / or quality. Thanks to
the scalability of the resulting compressed bit-stream, additional bit-rate adaptation is
possible; for instance for the transmission purposes. Experimental evaluation showed
that significant reduction in bit-rate can be achieved by the proposed approach
without loss of information relevant to surveillance applications.
In addition to more optimal compression strategy, novel approaches to performing
efficient motion estimation specific to surveillance videos are proposed and
implemented with experimental results. A real-time background subtractor is used to
detect the presence of any motion activity in the sequence. Different approaches for
selective motion estimation, GOP based, Frame based and Block based, are
implemented. In the former, motion estimation is performed for the whole group of
pictures (GOP) only when a moving object is detected for any frame of the GOP.
iii
While for the Frame based approach; each frame is tested for the motion activity and
consequently for selective motion estimation. The selective motion estimation
approach is further explored at a lower level as Block based selective motion
estimation. Experimental evaluation showed that significant reduction in
computational complexity can be achieved by applying the proposed strategy. In
addition to selective motion estimation, a tracker based motion estimation and fast
full search using multiple reference frames has been proposed for the surveillance
videos.
Extensive testing on different surveillance videos shows benefits of
application of proposed approaches to achieve the goals of the SCC
Kuaipedia: a Large-scale Multi-modal Short-video Encyclopedia
Online encyclopedias, such as Wikipedia, have been well-developed and
researched in the last two decades. One can find any attributes or other
information of a wiki item on a wiki page edited by a community of volunteers.
However, the traditional text, images and tables can hardly express some
aspects of an wiki item. For example, when we talk about ``Shiba Inu'', one may
care more about ``How to feed it'' or ``How to train it not to protect its
food''. Currently, short-video platforms have become a hallmark in the online
world. Whether you're on TikTok, Instagram, Kuaishou, or YouTube Shorts,
short-video apps have changed how we consume and create content today. Except
for producing short videos for entertainment, we can find more and more authors
sharing insightful knowledge widely across all walks of life. These short
videos, which we call knowledge videos, can easily express any aspects (e.g.
hair or how-to-feed) consumers want to know about an item (e.g. Shiba Inu), and
they can be systematically analyzed and organized like an online encyclopedia.
In this paper, we propose Kuaipedia, a large-scale multi-modal encyclopedia
consisting of items, aspects, and short videos lined to them, which was
extracted from billions of videos of Kuaishou (Kwai), a well-known short-video
platform in China. We first collected items from multiple sources and mined
user-centered aspects from millions of users' queries to build an item-aspect
tree. Then we propose a new task called ``multi-modal item-aspect linking'' as
an expansion of ``entity linking'' to link short videos into item-aspect pairs
and build the whole short-video encyclopedia. Intrinsic evaluations show that
our encyclopedia is of large scale and highly accurate. We also conduct
sufficient extrinsic experiments to show how Kuaipedia can help fundamental
applications such as entity typing and entity linking
High-level feature detection from video in TRECVid: a 5-year retrospective of achievements
Successful and effective content-based access to digital
video requires fast, accurate and scalable methods to determine the video content automatically. A variety of contemporary approaches to this rely on text taken from speech within the video, or on matching one video frame against others using low-level characteristics like
colour, texture, or shapes, or on determining and matching objects appearing within the video. Possibly the most important technique, however, is one which determines the presence or absence of a high-level or semantic feature, within a video clip or shot. By utilizing dozens, hundreds or even thousands of such semantic features we can support many kinds of content-based video navigation. Critically however, this depends on being able to determine whether each feature is or is not present in a video clip.
The last 5 years have seen much progress in the development of techniques to determine the presence of semantic features within video. This progress can be tracked in the annual TRECVid benchmarking activity where dozens of research groups measure the effectiveness of their techniques on common data and using an open, metrics-based approach. In this chapter we summarise the work
done on the TRECVid high-level feature task, showing the
progress made year-on-year. This provides a fairly comprehensive statement on where the state-of-the-art is regarding this important task, not just for one research group or for one approach, but across the spectrum. We then use this past and on-going work as a basis for highlighting the trends that are emerging in this area, and the questions which remain to be addressed before we can
achieve large-scale, fast and reliable high-level feature detection on video
- …