1,093 research outputs found

    Efficient tracking of team sport players with few game-specific annotations

    Full text link
    One of the requirements for team sports analysis is to track and recognize players. Many tracking and reidentification methods have been proposed in the context of video surveillance. They show very convincing results when tested on public datasets such as the MOT challenge. However, the performance of these methods are not as satisfactory when applied to player tracking. Indeed, in addition to moving very quickly and often being occluded, the players wear the same jersey, which makes the task of reidentification very complex. Some recent tracking methods have been developed more specifically for the team sport context. Due to the lack of public data, these methods use private datasets that make impossible a comparison with them. In this paper, we propose a new generic method to track team sport players during a full game thanks to few human annotations collected via a semi-interactive system. Non-ambiguous tracklets and their appearance features are automatically generated with a detection and a reidentification network both pre-trained on public datasets. Then an incremental learning mechanism trains a Transformer to classify identities using few game-specific human annotations. Finally, tracklets are linked by an association algorithm. We demonstrate the efficiency of our approach on a challenging rugby sevens dataset. To overcome the lack of public sports tracking dataset, we publicly release this dataset at https://kalisteo.cea.fr/index.php/free-resources/. We also show that our method is able to track rugby sevens players during a full match, if they are observable at a minimal resolution, with the annotation of only 6 few seconds length tracklets per player.Comment: Accepted to 2022 8th International Workshop on Computer Vision in Sports (CVsports 2022

    A Survey of Deep Learning in Sports Applications: Perception, Comprehension, and Decision

    Full text link
    Deep learning has the potential to revolutionize sports performance, with applications ranging from perception and comprehension to decision. This paper presents a comprehensive survey of deep learning in sports performance, focusing on three main aspects: algorithms, datasets and virtual environments, and challenges. Firstly, we discuss the hierarchical structure of deep learning algorithms in sports performance which includes perception, comprehension and decision while comparing their strengths and weaknesses. Secondly, we list widely used existing datasets in sports and highlight their characteristics and limitations. Finally, we summarize current challenges and point out future trends of deep learning in sports. Our survey provides valuable reference material for researchers interested in deep learning in sports applications

    Player tracking and identification in broadcast ice hockey video

    Get PDF
    Tracking and identifying players is a fundamental step in computer vision-based ice hockey analytics. The data generated by tracking is used in many other downstream tasks, such as game event detection and game strategy analysis. Player tracking and identification is a challenging problem since the motion of players in hockey is fast-paced and non-linear when compared to pedestrians. There is also significant player-player and player-board occlusion, camera panning and zooming in hockey broadcast video. Identifying players in ice hockey is a difficult task since the players of the same team appear almost identical, with the jersey number the only consistent discriminating factor between players. In this thesis, an automated system to track and identify players in broadcast NHL hockey videos is introduced. The system is composed of player tracking, team identification and player identification models. In addition, the game roster and player shift data is incorporated to further increase the accuracy of player identification in the overall system. Due to the absence of publicly available datasets, new datasets for player tracking, team identification and player identification in ice-hockey are also introduced. Remarking that there is a lack of publicly available research for tracking ice hockey players making use of recent advancements in deep learning, we test five state-of-the-art tracking algorithms on an ice-hockey dataset and analyze the performance and failure cases. We introduce a multi-task loss based network to identify player jersey numbers from static images. The network uses multi-task learning to simultaneously predict and learn from two different representations of a player jersey number. Through various experiments and ablation studies it was demonstrated that the multi-task learning based network performed better than the constituent single-task settings. We incorporate the temporal dimension into account for jersey number identification by inferring jersey number from sequences of player images - called player tracklets. To do so, we tested two popular deep temporal networks (1) Temporal 1D convolutional neural network (CNN) and (2) Transformer network. The network trained using the multi-task loss served as a backbone for these two networks. In addition, we also introduce a weakly-supervised learning strategy to improve training speed and convergence for the transformer network. Experimental results demonstrate that the proposed networks outperform the state-of-the art. Finally, we describe in detail how the player tracking and identification models are put together to form the holistic pipeline starting from raw broadcast NHL video to obtain uniquely identified player tracklets. The process of incorporating the game roster and player shifts to improve player identification is explained. An overall accuracy of 88% is obtained on the test set. An off-the-shelf automatic homography registration model and a puck localization model are also incorporated into the pipeline to obtain the tracks of both player and puck on the ice rink

    A Four-Stage Data Augmentation Approach to ResNet-Conformer Based Acoustic Modeling for Sound Event Localization and Detection

    Full text link
    In this paper, we propose a novel four-stage data augmentation approach to ResNet-Conformer based acoustic modeling for sound event localization and detection (SELD). First, we explore two spatial augmentation techniques, namely audio channel swapping (ACS) and multi-channel simulation (MCS), to deal with data sparsity in SELD. ACS and MDS focus on augmenting the limited training data with expanding direction of arrival (DOA) representations such that the acoustic models trained with the augmented data are robust to localization variations of acoustic sources. Next, time-domain mixing (TDM) and time-frequency masking (TFM) are also investigated to deal with overlapping sound events and data diversity. Finally, ACS, MCS, TDM and TFM are combined in a step-by-step manner to form an effective four-stage data augmentation scheme. Tested on the Detection and Classification of Acoustic Scenes and Events (DCASE) 2020 data sets, our proposed augmentation approach greatly improves the system performance, ranking our submitted system in the first place in the SELD task of DCASE 2020 Challenge. Furthermore, we employ a ResNet-Conformer architecture to model both global and local context dependencies of an audio sequence to yield further gains over those architectures used in the DCASE 2020 SELD evaluations.Comment: 12 pages, 8 figure

    Fashion-Oriented Image Captioning with External Knowledge Retrieval and Fully Attentive Gates

    Get PDF
    Research related to fashion and e-commerce domains is gaining attention in computer vision and multimedia communities. Following this trend, this article tackles the task of generating fine-grained and accurate natural language descriptions of fashion items, a recently-proposed and under-explored challenge that is still far from being solved. To overcome the limitations of previous approaches, a transformer-based captioning model was designed with the integration of external textual memory that could be accessed through k-nearest neighbor (kNN) searches. From an architectural point of view, the proposed transformer model can read and retrieve items from the external memory through cross-attention operations, and tune the flow of information coming from the external memory thanks to a novel fully attentive gate. Experimental analyses were carried out on the fashion captioning dataset (FACAD) for fashion image captioning, which contains more than 130k fine-grained descriptions, validating the effectiveness of the proposed approach and the proposed architectural strategies in comparison with carefully designed baselines and state-of-the-art approaches. The presented method constantly outperforms all compared approaches, demonstrating its effectiveness for fashion image captioning
    corecore