3,249 research outputs found
Deep Unsupervised Multi-View Detection of Video Game Stream Highlights
We consider the problem of automatic highlight-detection in video game streams. Currently, the vast majority of highlight-detection systems for games are triggered by the occurrence of hard-coded game events (e.g., score change, end-game), while most advanced tools and techniques are based on detection of highlights via visual analysis of game footage. We argue that in the context of game streaming, events that may constitute highlights are not only dependent on game footage, but also on social signals that are conveyed by the streamer during the play session (e.g., when interacting with viewers, or when commenting and reacting to the game). In this light, we present a multi-view unsupervised deep learning methodology for novelty-based highlight detection. The method jointly analyses both game footage and social signals such as the players facial expressions and speech, and shows promising results for generating highlights on streams of popular games such as Player Unknown's Battlegrounds
Towards Structured Analysis of Broadcast Badminton Videos
Sports video data is recorded for nearly every major tournament but remains
archived and inaccessible to large scale data mining and analytics. It can only
be viewed sequentially or manually tagged with higher-level labels which is
time consuming and prone to errors. In this work, we propose an end-to-end
framework for automatic attributes tagging and analysis of sport videos. We use
commonly available broadcast videos of matches and, unlike previous approaches,
does not rely on special camera setups or additional sensors.
Our focus is on Badminton as the sport of interest. We propose a method to
analyze a large corpus of badminton broadcast videos by segmenting the points
played, tracking and recognizing the players in each point and annotating their
respective badminton strokes. We evaluate the performance on 10 Olympic matches
with 20 players and achieved 95.44% point segmentation accuracy, 97.38% player
detection score ([email protected]), 97.98% player identification accuracy, and stroke
segmentation edit scores of 80.48%. We further show that the automatically
annotated videos alone could enable the gameplay analysis and inference by
computing understandable metrics such as player's reaction time, speed, and
footwork around the court, etc.Comment: 9 page
Multi-Modal Machine Learning for Assessing Gaming Skills in Online Streaming: A Case Study with CS:GO
Online streaming is an emerging market that address much attention. Assessing
gaming skills from videos is an important task for streaming service providers
to discover talented gamers. Service providers require the information to offer
customized recommendation and service promotion to their customers. Meanwhile,
this is also an important multi-modal machine learning tasks since online
streaming combines vision, audio and text modalities. In this study we begin by
identifying flaws in the dataset and proceed to clean it manually. Then we
propose several variants of latest end-to-end models to learn joint
representation of multiple modalities. Through our extensive experimentation,
we demonstrate the efficacy of our proposals. Moreover, we identify that our
proposed models is prone to identifying users instead of learning meaningful
representations. We purpose future work to address the issue in the end
A New Action Recognition Framework for Video Highlights Summarization in Sporting Events
To date, machine learning for human action recognition in video has been
widely implemented in sports activities. Although some studies have been
successful in the past, precision is still the most significant concern. In
this study, we present a high-accuracy framework to automatically clip the
sports video stream by using a three-level prediction algorithm based on two
classical open-source structures, i.e., YOLO-v3 and OpenPose. It is found that
by using a modest amount of sports video training data, our methodology can
perform sports activity highlights clipping accurately. Comparing with the
previous systems, our methodology shows some advantages in accuracy. This study
may serve as a new clipping system to extend the potential applications of the
video summarization in sports field, as well as facilitates the development of
match analysis system.Comment: 18 pages, 3 figures, 4 table
Temporal Localization of Fine-Grained Actions in Videos by Domain Transfer from Web Images
We address the problem of fine-grained action localization from temporally
untrimmed web videos. We assume that only weak video-level annotations are
available for training. The goal is to use these weak labels to identify
temporal segments corresponding to the actions, and learn models that
generalize to unconstrained web videos. We find that web images queried by
action names serve as well-localized highlights for many actions, but are
noisily labeled. To solve this problem, we propose a simple yet effective
method that takes weak video labels and noisy image labels as input, and
generates localized action frames as output. This is achieved by cross-domain
transfer between video frames and web images, using pre-trained deep
convolutional neural networks. We then use the localized action frames to train
action recognition models with long short-term memory networks. We collect a
fine-grained sports action data set FGA-240 of more than 130,000 YouTube
videos. It has 240 fine-grained actions under 85 sports activities. Convincing
results are shown on the FGA-240 data set, as well as the THUMOS 2014
localization data set with untrimmed training videos.Comment: Camera ready version for ACM Multimedia 201
- …