Search CORE

23 research outputs found

Efficient tracking of team sport players with few game-specific annotations

Author: Maglo Adrien
Orcesi Astrid
Pham Quoc-Cuong
Publication venue
Publication date: 08/04/2022
Field of study

One of the requirements for team sports analysis is to track and recognize players. Many tracking and reidentification methods have been proposed in the context of video surveillance. They show very convincing results when tested on public datasets such as the MOT challenge. However, the performance of these methods are not as satisfactory when applied to player tracking. Indeed, in addition to moving very quickly and often being occluded, the players wear the same jersey, which makes the task of reidentification very complex. Some recent tracking methods have been developed more specifically for the team sport context. Due to the lack of public data, these methods use private datasets that make impossible a comparison with them. In this paper, we propose a new generic method to track team sport players during a full game thanks to few human annotations collected via a semi-interactive system. Non-ambiguous tracklets and their appearance features are automatically generated with a detection and a reidentification network both pre-trained on public datasets. Then an incremental learning mechanism trains a Transformer to classify identities using few game-specific human annotations. Finally, tracklets are linked by an association algorithm. We demonstrate the efficiency of our approach on a challenging rugby sevens dataset. To overcome the lack of public sports tracking dataset, we publicly release this dataset at https://kalisteo.cea.fr/index.php/free-resources/. We also show that our method is able to track rugby sevens players during a full match, if they are observable at a minimal resolution, with the annotation of only 6 few seconds length tracklets per player.Comment: Accepted to 2022 8th International Workshop on Computer Vision in Sports (CVsports 2022

arXiv.org e-Print Archive

HAL-CEA

Hal-Diderot

Classifying All Interacting Pairs in a Single Shot

Author: Audigier Romaric
Chafik Sanaa
Luvison Bertrand
Orcesi Astrid
Publication venue
Publication date: 13/01/2020
Field of study

In this paper, we introduce a novel human interaction detection approach, based on CALIPSO (Classifying ALl Interacting Pairs in a Single shOt), a classifier of human-object interactions. This new single-shot interaction classifier estimates interactions simultaneously for all human-object pairs, regardless of their number and class. State-of-the-art approaches adopt a multi-shot strategy based on a pairwise estimate of interactions for a set of human-object candidate pairs, which leads to a complexity depending, at least, on the number of interactions or, at most, on the number of candidate pairs. In contrast, the proposed method estimates the interactions on the whole image. Indeed, it simultaneously estimates all interactions between all human subjects and object targets by performing a single forward pass throughout the image. Consequently, it leads to a constant complexity and computation time independent of the number of subjects, objects or interactions in the image. In detail, interaction classification is achieved on a dense grid of anchors thanks to a joint multi-task network that learns three complementary tasks simultaneously: (i) prediction of the types of interaction, (ii) estimation of the presence of a target and (iii) learning of an embedding which maps interacting subject and target to a same representation, by using a metric learning strategy. In addition, we introduce an object-centric passive-voice verb estimation which significantly improves results. Evaluations on the two well-known Human-Object Interaction image datasets, V-COCO and HICO-DET, demonstrate the competitiveness of the proposed method (2nd place) compared to the state-of-the-art while having constant computation time regardless of the number of objects and interactions in the image.Comment: WACV 2020 (to appear

arXiv.org e-Print Archive

Crossref

HAL-CEA

Similarity Contrastive Estimation for Image and Video Soft Contrastive Self-Supervised Learning

Author: Denize Julien
Hérault Romain
Orcesi Astrid
Rabarisoa Jaonary
Publication venue
Publication date: 21/12/2022
Field of study

Contrastive representation learning has proven to be an effective self-supervised learning method for images and videos. Most successful approaches are based on Noise Contrastive Estimation (NCE) and use different views of an instance as positives that should be contrasted with other instances, called negatives, that are considered as noise. However, several instances in a dataset are drawn from the same distribution and share underlying semantic information. A good data representation should contain relations between the instances, or semantic similarity and dissimilarity, that contrastive learning harms by considering all negatives as noise. To circumvent this issue, we propose a novel formulation of contrastive learning using semantic similarity between instances called Similarity Contrastive Estimation (SCE). Our training objective is a soft contrastive one that brings the positives closer and estimates a continuous distribution to push or pull negative instances based on their learned similarities. We validate empirically our approach on both image and video representation learning. We show that SCE performs competitively with the state of the art on the ImageNet linear evaluation protocol for fewer pretraining epochs and that it generalizes to several downstream image tasks. We also show that SCE reaches state-of-the-art results for pretraining video representation and that the learned representation can generalize to video downstream tasks.Comment: Extended version of our WACV 2023 paper to video self-supervised learnin

arXiv.org e-Print Archive

HAL-CEA

COMEDIAN: Self-Supervised Learning and Knowledge Distillation for Action Spotting using Transformers

Author: Denize Julien
Hérault Romain
Liashuha Mykola
Orcesi Astrid
Rabarisoa Jaonary
Publication venue
Publication date: 03/09/2023
Field of study

We present COMEDIAN, a novel pipeline to initialize spatio-temporal transformers for action spotting, which involves self-supervised learning and knowledge distillation. Action spotting is a timestamp-level temporal action detection task. Our pipeline consists of three steps, with two initialization stages. First, we perform self-supervised initialization of a spatial transformer using short videos as input. Additionally, we initialize a temporal transformer that enhances the spatial transformer's outputs with global context through knowledge distillation from a pre-computed feature bank aligned with each short video segment. In the final step, we fine-tune the transformers to the action spotting task. The experiments, conducted on the SoccerNet-v2 dataset, demonstrate state-of-the-art performance and validate the effectiveness of COMEDIAN's pretraining paradigm. Our results highlight several advantages of our pretraining pipeline, including improved performance and faster convergence compared to non-pretrained models.Comment: Source code is available here: https://github.com/juliendenize/eztorc

arXiv.org e-Print Archive

SoccerNet 2023 Challenges Results

Author: Abdelaziz Amr
Abdelwahed Mohamed
Alahi Alexandre
Ardö Håkan
Baikulov Ruslan
Barnich Olivier
Be'ery Ishay
Chen Chen
Chen Ruilong
Chen Shimin
Choi Gyusik
Cioppa Anthony
Clapés Albert
Dai Wei
De Vleeschouwer Christophe
Deliège Adrien
Denize Julien
Deuser Fabian
Ding Shouhong
Escalera Sergio
Fahrudin Hasby
Falaleev Nikolay
Fu Jiajun
Fukushima Ryuto
Gan Yiyang
Ghanem Bernard
Giancola Silvio
Guo Hao
Habel Konrad
Held Jan
Hinojosa Carlos
Huang Zhijian
Hérault Romain
Jia Qiong
Jiao Licheng
Joo Yeeun
Kamal Abdullah
Kim Hankyul
Kim Juntae
Kobayashi Kenji
Koguchi Hidenari
Lee Jeongae
Lee Seungcheon
Li Junjie
Li Menglong
Li Tianjiao
Li Wei
Li Zhiheng
Liashuha Mykola
Lim Byoungkwon
Liu Bin
Liu Ruixuan
Luo Weixin
Ma Lin
Ma Yanbiao
Magera Floriane
Maglo Adrien
Mansourian Amir M.
Meng Ziyu
Miralles Pierre
Mkhallati Hassan
Moeslund Thomas B.
Muhammad Iftikar
Nakajima Kota
Nang Jongho
Nasr Mohamed
Orcesi Astrid
Oswald Norbert
Peng Rui
Pham Quoc-Cuong
Rabarisoa Jaonary
Ruan Zheng
Salah Ibrahim
Scott Atom
Shen Wei
Shitrit Gal
Somers Vladimir
Someya Taiga
Song Ran
Synowiec Kamil
Uchida Ikuma
Van Droogenbroeck Marc
Wang Guanshuo
Wang Lizhi
Wang Luping
Xarles Artur
Xu Jinghang
Yan Feng
Yang Xinquan
Yerushalmy Ido
Yin Jianqin
Yu Fufu
Zeng Yingsen
Zhang Junpei
Zhang Kexin
Zhang Wei
Zhang Wenjie
Zhao Wending
Zhong Yujie
Zhou Mengying
Zhou Xin
Zhu Yongqiang
Publication venue: Springer Verlag
Publication date: 12/09/2023
Field of study

peer reviewedThe SoccerNet 2023 challenges were the third annual video understanding challenges organized by the SoccerNet team. For this third edition, the challenges were composed of seven vision-based tasks split into three main themes. The first theme, broadcast video understanding, is composed of three high-level tasks related to describing events occurring in the video broadcasts: (1) action spotting, focusing on retrieving all timestamps related to global actions in soccer, (2) ball action spotting, focusing on retrieving all timestamps related to the soccer ball change of state, and (3) dense video captioning, focusing on describing the broadcast with natural language and anchored timestamps. The second theme, field understanding, relates to the single task of (4) camera calibration, focusing on retrieving the intrinsic and extrinsic camera parameters from images. The third and last theme, player understanding, is composed of three low-level tasks related to extracting information about the players: (5) re-identification, focusing on retrieving the same players across multiple views, (6) multiple object tracking, focusing on tracking players and the ball through unedited video streams, and (7) jersey number recognition, focusing on recognizing the jersey number of players from tracklets. Compared to the previous editions of the SoccerNet challenges, tasks (2-3-7) are novel, including new annotations and data, task (4) was enhanced with more data and annotations, and task (6) now focuses on end-to-end approaches. More information on the tasks, challenges, and leaderboards are available on https://www.soccer-net.org. Baselines and development kits can be found on https://github.com/SoccerNet

Open Repository and Bibliography - Liège

Learning methods applied to vision-based human behaviour analysis

Author: Orcesi Astrid
Publication venue
Publication date: 14/06/2023
Field of study

L'analyse du comportement humain par vision est une thématique de recherche très étudiée car malgré les progrès apportés par l'apprentissage profond en vision par ordinateur, comprendre finement ce qui est en train de se passer dans une scène est une tâche loin d'être résolue car elle présente un très haut niveau sémantique.Dans cette thèse nous nous intéressons à deux applications : la reconnaissance d'activités longues temporellement dans des vidéos et la détection d'interaction dans des images.La première contribution de ces travaux est l'élaboration de la première base de données d'activités quotidiennes présentant de fortes variabilités intra-classe.La deuxième contribution est la proposition d'une nouvelle méthode de détection d'interaction en une seule passe sur l'image ce qui lui permet d'être beaucoup plus rapide que les méthodes de l'état de l'art en deux étapes et appliquant un raisonnement par paire d'instances.Enfin, la troisième contribution de cette thèse est la constitution d'un nouveau jeu de données d'interactions composé d'interactions à la fois entre des personnes et des objets mais également entre des personnes ce qui n'existait pas jusqu'à maintenant et qui permet pourtant une analyse des interactions humaines exhaustive. De manière à proposer des résultats de référence sur ce nouveau jeu de données, la précédente méthode de détection d'interactions a été améliorée en proposant un apprentissage multi-tâches ce qui permet d'obtenir les meilleurs résultats sur la base de données publique largement utilisée par la communauté.The analysis of human behavior by vision is a strong studied research topic. Indeed despite the progress brought by deep learning in computer vision, understanding finely what is happening in a scene is a task far from being solved because it presents a very high semantic level.In this thesis we focus on two applications: the recognition of temporally long activities in videos and the detection of interaction in images.The first contribution of this work is the development of the first database of daily activities with high intra-class variability.The second contribution is the proposal of a new method for interaction detection in a single shot on the image which allows it to be much faster than the state of the art two-step methods which apply a reasoning by pair of instances.Finally, the third contribution of this thesis is the constitution of a new interaction dataset composed of interactions both between people and objects and between people which did not exist until now and which allows an exhaustive analysis of human interactions. In order to propose baseline results on this new dataset, the previous interaction detection method has been improved by proposing a multi-task learning which reaches the best results on the public dataset widely used by the community

Theses.fr

Méthodes d'apprentissage appliquées à l'analyse du comportement humain par vision

Author: Orcesi Astrid
Publication venue: HAL CCSD
Publication date: 14/06/2023
Field of study

The analysis of human behavior by vision is a strong studied research topic. Indeed despite the progress brought by deep learning in computer vision, understanding finely what is happening in a scene is a task far from being solved because it presents a very high semantic level.In this thesis we focus on two applications: the recognition of temporally long activities in videos and the detection of interaction in images.The first contribution of this work is the development of the first database of daily activities with high intra-class variability.The second contribution is the proposal of a new method for interaction detection in a single shot on the image which allows it to be much faster than the state of the art two-step methods which apply a reasoning by pair of instances.Finally, the third contribution of this thesis is the constitution of a new interaction dataset composed of interactions both between people and objects and between people which did not exist until now and which allows an exhaustive analysis of human interactions. In order to propose baseline results on this new dataset, the previous interaction detection method has been improved by proposing a multi-task learning which reaches the best results on the public dataset widely used by the community.L'analyse du comportement humain par vision est une thématique de recherche très étudiée car malgré les progrès apportés par l'apprentissage profond en vision par ordinateur, comprendre finement ce qui est en train de se passer dans une scène est une tâche loin d'être résolue car elle présente un très haut niveau sémantique.Dans cette thèse nous nous intéressons à deux applications : la reconnaissance d'activités longues temporellement dans des vidéos et la détection d'interaction dans des images.La première contribution de ces travaux est l'élaboration de la première base de données d'activités quotidiennes présentant de fortes variabilités intra-classe.La deuxième contribution est la proposition d'une nouvelle méthode de détection d'interaction en une seule passe sur l'image ce qui lui permet d'être beaucoup plus rapide que les méthodes de l'état de l'art en deux étapes et appliquant un raisonnement par paire d'instances.Enfin, la troisième contribution de cette thèse est la constitution d'un nouveau jeu de données d'interactions composé d'interactions à la fois entre des personnes et des objets mais également entre des personnes ce qui n'existait pas jusqu'à maintenant et qui permet pourtant une analyse des interactions humaines exhaustive. De manière à proposer des résultats de référence sur ce nouveau jeu de données, la précédente méthode de détection d'interactions a été améliorée en proposant un apprentissage multi-tâches ce qui permet d'obtenir les meilleurs résultats sur la base de données publique largement utilisée par la communauté

HAL-CEA

Méthodes d'apprentissage appliquées à l'analyse du comportement humain par vision

Author: Orcesi Astrid
Publication venue: HAL CCSD
Publication date: 14/06/2023
Field of study

Thèses en Ligne

KaliCalib: A Framework for Basketball Court Registration

Author: Maglo Adrien
Orcesi Astrid
Pham Quoc-Cuong
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 16/09/2022
Field of study

International audienceTracking the players and the ball in team sports is key to analyse the performance or to enhance the game watching experience with augmented reality. When the only sources for this data are broadcast videos, sports-field registration systems are required to estimate the homography and re-project the ball or the players from the image space to the field space. This paper describes a new basketball court registration framework in the context of the MMSports 2022 camera calibration challenge. The method is based on the estimation by an encoder-decoder network of the positions of keypoints sampled with perspective-aware constraints. The regression of the basket positions and heavy data augmentation techniques make the model robust to different arenas. Ablation studies show the positive effects of our contributions on the challenge test set. Our method divides the mean squared error by 4.7 compared to the challenge baseline

arXiv.org e-Print Archive

HAL-CEA

Efficient tracking of team sport players with few game-specific annotations

Author: Maglo Adrien
Orcesi Astrid
Pham Quoc-Cuong
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 19/06/2022
Field of study

International audienceOne of the requirements for team sports analysis is to track and recognize players. Many tracking and reidentification methods have been proposed in the context of video surveillance. They show very convincing results when tested on public datasets such as the MOT challenge. However, the performance of these methods are not as satisfactory when applied to player tracking. Indeed, in addition to moving very quickly and often being occluded, the players wear the same jersey, which makes the task of reidentification very complex. Some recent tracking methods have been developed more specifically for the team sport context. Due to the lack of public data, these methods use private datasets that make impossible a comparison with them. In this paper, we propose a new generic method to track team sport players during a full game thanks to few human annotations collected via a semi-interactive system. Non-ambiguous tracklets and their appearance features are automatically generated with a detection and a reidentification network both pre-trained on public datasets. Then an incremental learning mechanism trains a Transformer to classify identities using few game-specific human annotations. Finally, tracklets are linked by an association algorithm. We demonstrate the efficiency of our approach on a challenging rugby sevens dataset. To overcome the lack of public sports tracking dataset, we publicly release this dataset at https://kalisteo.cea.fr/index.php/free-resources/. We also show that our method is able to track rugby sevens players during a full match, if they are observable at a minimal resolution, with the annotation of only 6 few seconds length tracklets per player

HAL-CEA

Hal-Diderot