53 research outputs found
Real-Time Semantic Background Subtraction
Semantic background subtraction SBS has been shown to improve the performance
of most background subtraction algorithms by combining them with semantic
information, derived from a semantic segmentation network. However, SBS
requires high-quality semantic segmentation masks for all frames, which are
slow to compute. In addition, most state-of-the-art background subtraction
algorithms are not real-time, which makes them unsuitable for real-world
applications. In this paper, we present a novel background subtraction
algorithm called Real-Time Semantic Background Subtraction (denoted RT-SBS)
which extends SBS for real-time constrained applications while keeping similar
performances. RT-SBS effectively combines a real-time background subtraction
algorithm with high-quality semantic information which can be provided at a
slower pace, independently for each pixel. We show that RT-SBS coupled with
ViBe sets a new state of the art for real-time background subtraction
algorithms and even competes with the non real-time state-of-the-art ones. Note
that we provide python CPU and GPU implementations of RT-SBS at
https://github.com/cioppaanthony/rt-sbs.Comment: Accepted and Published at ICIP 202
SoccerNet-Caption: Dense Video Captioning for Soccer Broadcasts Commentaries
Soccer is more than just a game - it is a passion that transcends borders and
unites people worldwide. From the roar of the crowds to the excitement of the
commentators, every moment of a soccer match is a thrill. Yet, with so many
games happening simultaneously, fans cannot watch them all live. Notifications
for main actions can help, but lack the engagement of live commentary, leaving
fans feeling disconnected. To fulfill this need, we propose in this paper a
novel task of dense video captioning focusing on the generation of textual
commentaries anchored with single timestamps. To support this task, we
additionally present a challenging dataset consisting of almost 37k timestamped
commentaries across 715.9 hours of soccer broadcast videos. Additionally, we
propose a first benchmark and baseline for this task, highlighting the
difficulty of temporally anchoring commentaries yet showing the capacity to
generate meaningful commentaries. By providing broadcasters with a tool to
summarize the content of their video with the same level of engagement as a
live game, our method could help satisfy the needs of the numerous fans who
follow their team but cannot necessarily watch the live game. We believe our
method has the potential to enhance the accessibility and understanding of
soccer content for a wider audience, bringing the excitement of the game to
more people
VARS: Video Assistant Referee System for Automated Soccer Decision Making from Multiple Views
The Video Assistant Referee (VAR) has revolutionized association football,
enabling referees to review incidents on the pitch, make informed decisions,
and ensure fairness. However, due to the lack of referees in many countries and
the high cost of the VAR infrastructure, only professional leagues can benefit
from it. In this paper, we propose a Video Assistant Referee System (VARS) that
can automate soccer decision-making. VARS leverages the latest findings in
multi-view video analysis, to provide real-time feedback to the referee, and
help them make informed decisions that can impact the outcome of a game. To
validate VARS, we introduce SoccerNet-MVFoul, a novel video dataset of soccer
fouls from multiple camera views, annotated with extensive foul descriptions by
a professional soccer referee, and we benchmark our VARS to automatically
recognize the characteristics of these fouls. We believe that VARS has the
potential to revolutionize soccer refereeing and take the game to new heights
of fairness and accuracy across all levels of professional and amateur
federations.Comment: Accepted at CVSports'2
Semi-Supervised Training to Improve Detection for Satellite Images
We propose a novel semi-supervised learning method for leveraging unlabeled data by generating pseudo labels with a teacher-student approach. We also introduce three loss parametrizations to introduce doubt in the pseudo labels based on their confidence scores. Finally, we show that our method allows to improve detection performance for satellite images
Semi-Supervised Training to Improve Player and Ball Detection in Soccer
peer reviewedAccurate player and ball detection has become increasingly important in recent years for sport analytics. As most state-of-the-art methods rely on training deep learning networks in a supervised fashion, they require huge amounts of annotated data, which are rarely available. In this paper, we present a novel generic semi-supervised method to train a network based on a labeled image dataset by leveraging a large unlabeled dataset of soccer broadcast videos. More precisely, we design a teacher-student approach in which the teacher produces surrogate annotations on the unlabeled data to be used later for training a student which has the same architecture as the teacher. Furthermore, we introduce three training loss parametrizations that allow the student to doubt the predictions of the teacher during training depending on the proposal confidence score. We show that including unlabeled data in the training process allows to substantially improve the performances of the detection network trained only on the labeled data. Finally, we provide a thorough performance study including different proportions of labeled and unlabeled data, and establish the first benchmark on the new SoccerNet-v3 detection task, with an mAP of 52.3%. Our code is available at https://github.com/rvandeghen/SST
Towards Active Learning for Action Spotting in Association Football Videos
Association football is a complex and dynamic sport, with numerous actions
occurring simultaneously in each game. Analyzing football videos is challenging
and requires identifying subtle and diverse spatio-temporal patterns. Despite
recent advances in computer vision, current algorithms still face significant
challenges when learning from limited annotated data, lowering their
performance in detecting these patterns. In this paper, we propose an active
learning framework that selects the most informative video samples to be
annotated next, thus drastically reducing the annotation effort and
accelerating the training of action spotting models to reach the highest
accuracy at a faster pace. Our approach leverages the notion of uncertainty
sampling to select the most challenging video clips to train on next, hastening
the learning process of the algorithm. We demonstrate that our proposed active
learning framework effectively reduces the required training data for accurate
action spotting in football videos. We achieve similar performances for action
spotting with NetVLAD++ on SoccerNet-v2, using only one-third of the dataset,
indicating significant capabilities for reducing annotation time and improving
data efficiency. We further validate our approach on two new datasets that
focus on temporally localizing actions of headers and passes, proving its
effectiveness across different action semantics in football. We believe our
active learning framework for action spotting would support further
applications of action spotting algorithms and accelerate annotation campaigns
in the sports domain.Comment: Accepted at CVSports'2
SoccerNet-Tracking: Multiple Object Tracking Dataset and Benchmark in Soccer Videos
Tracking objects in soccer videos is extremely important to gather both
player and team statistics, whether it is to estimate the total distance run,
the ball possession or the team formation. Video processing can help automating
the extraction of those information, without the need of any invasive sensor,
hence applicable to any team on any stadium. Yet, the availability of datasets
to train learnable models and benchmarks to evaluate methods on a common
testbed is very limited. In this work, we propose a novel dataset for multiple
object tracking composed of 200 sequences of 30s each, representative of
challenging soccer scenarios, and a complete 45-minutes half-time for long-term
tracking. The dataset is fully annotated with bounding boxes and tracklet IDs,
enabling the training of MOT baselines in the soccer domain and a full
benchmarking of those methods on our segregated challenge sets. Our analysis
shows that multiple player, referee and ball tracking in soccer videos is far
from being solved, with several improvement required in case of fast motion or
in scenarios of severe occlusion.Comment: Paper accepted for the CVsports workshop at CVPR2022. This document
contains 8 pages + reference
- …