1,059 research outputs found
Leveraging Contextual Cues for Generating Basketball Highlights
The massive growth of sports videos has resulted in a need for automatic
generation of sports highlights that are comparable in quality to the
hand-edited highlights produced by broadcasters such as ESPN. Unlike previous
works that mostly use audio-visual cues derived from the video, we propose an
approach that additionally leverages contextual cues derived from the
environment that the game is being played in. The contextual cues provide
information about the excitement levels in the game, which can be ranked and
selected to automatically produce high-quality basketball highlights. We
introduce a new dataset of 25 NCAA games along with their play-by-play stats
and the ground-truth excitement data for each basket. We explore the
informativeness of five different cues derived from the video and from the
environment through user studies. Our experiments show that for our study
participants, the highlights produced by our system are comparable to the ones
produced by ESPN for the same games.Comment: Proceedings of ACM Multimedia 201
Foul prediction with estimated poses from soccer broadcast video
Recent advances in computer vision have made significant progress in tracking
and pose estimation of sports players. However, there have been fewer studies
on behavior prediction with pose estimation in sports, in particular, the
prediction of soccer fouls is challenging because of the smaller image size of
each player and of difficulty in the usage of e.g., the ball and pose
information. In our research, we introduce an innovative deep learning approach
for anticipating soccer fouls. This method integrates video data, bounding box
positions, image details, and pose information by curating a novel soccer foul
dataset. Our model utilizes a combination of convolutional and recurrent neural
networks (CNNs and RNNs) to effectively merge information from these four
modalities. The experimental results show that our full model outperformed the
ablated models, and all of the RNN modules, bounding box position and image,
and estimated pose were useful for the foul prediction. Our findings have
important implications for a deeper understanding of foul play in soccer and
provide a valuable reference for future research and practice in this area
A Survey of Deep Learning in Sports Applications: Perception, Comprehension, and Decision
Deep learning has the potential to revolutionize sports performance, with
applications ranging from perception and comprehension to decision. This paper
presents a comprehensive survey of deep learning in sports performance,
focusing on three main aspects: algorithms, datasets and virtual environments,
and challenges. Firstly, we discuss the hierarchical structure of deep learning
algorithms in sports performance which includes perception, comprehension and
decision while comparing their strengths and weaknesses. Secondly, we list
widely used existing datasets in sports and highlight their characteristics and
limitations. Finally, we summarize current challenges and point out future
trends of deep learning in sports. Our survey provides valuable reference
material for researchers interested in deep learning in sports applications
SoccerNet-Caption: Dense Video Captioning for Soccer Broadcasts Commentaries
Soccer is more than just a game - it is a passion that transcends borders and
unites people worldwide. From the roar of the crowds to the excitement of the
commentators, every moment of a soccer match is a thrill. Yet, with so many
games happening simultaneously, fans cannot watch them all live. Notifications
for main actions can help, but lack the engagement of live commentary, leaving
fans feeling disconnected. To fulfill this need, we propose in this paper a
novel task of dense video captioning focusing on the generation of textual
commentaries anchored with single timestamps. To support this task, we
additionally present a challenging dataset consisting of almost 37k timestamped
commentaries across 715.9 hours of soccer broadcast videos. Additionally, we
propose a first benchmark and baseline for this task, highlighting the
difficulty of temporally anchoring commentaries yet showing the capacity to
generate meaningful commentaries. By providing broadcasters with a tool to
summarize the content of their video with the same level of engagement as a
live game, our method could help satisfy the needs of the numerous fans who
follow their team but cannot necessarily watch the live game. We believe our
method has the potential to enhance the accessibility and understanding of
soccer content for a wider audience, bringing the excitement of the game to
more people
Detecting complex events in user-generated video using concept classifiers
Automatic detection of complex events in user-generated
videos (UGV) is a challenging task due to its new characteristics differing from broadcast video. In this work, we firstly summarize the new characteristics of UGV, and then explore how to utilize concept classifiers to recognize complex events in UGV content. The method starts from manually selecting a variety of relevant concepts, followed byconstructing classifiers for these concepts. Finally, complex event detectors are learned by using the concatenated probabilistic scores of these concept classifiers as features. Further, we also compare three different fusion operations of probabilistic scores, namely Maximum, Average and Minimum fusion. Experimental results suggest that our method provides promising results. It also shows that Maximum fusion tends to give better performance for most complex events
SoccerNet-Tracking: Multiple Object Tracking Dataset and Benchmark in Soccer Videos
Tracking objects in soccer videos is extremely important to gather both
player and team statistics, whether it is to estimate the total distance run,
the ball possession or the team formation. Video processing can help automating
the extraction of those information, without the need of any invasive sensor,
hence applicable to any team on any stadium. Yet, the availability of datasets
to train learnable models and benchmarks to evaluate methods on a common
testbed is very limited. In this work, we propose a novel dataset for multiple
object tracking composed of 200 sequences of 30s each, representative of
challenging soccer scenarios, and a complete 45-minutes half-time for long-term
tracking. The dataset is fully annotated with bounding boxes and tracklet IDs,
enabling the training of MOT baselines in the soccer domain and a full
benchmarking of those methods on our segregated challenge sets. Our analysis
shows that multiple player, referee and ball tracking in soccer videos is far
from being solved, with several improvement required in case of fast motion or
in scenarios of severe occlusion.Comment: Paper accepted for the CVsports workshop at CVPR2022. This document
contains 8 pages + reference
- …