Search CORE

122,212 research outputs found

Relational Self-Supervised Learning

Author: Qian Chen
Wang Fei
Wang Xiaogang
Xu Chang
You Shan
Zhang Changshui
Zheng Mingkai
Publication venue
Publication date: 19/10/2023
Field of study

Self-supervised Learning (SSL) including the mainstream contrastive learning has achieved great success in learning visual representations without data annotations. However, most methods mainly focus on the instance level information (\ie, the different augmented images of the same instance should have the same feature or cluster into the same class), but there is a lack of attention on the relationships between different instances. In this paper, we introduce a novel SSL paradigm, which we term as relational self-supervised learning (ReSSL) framework that learns representations by modeling the relationship between different instances. Specifically, our proposed method employs sharpened distribution of pairwise similarities among different instances as \textit{relation} metric, which is thus utilized to match the feature embeddings of different augmentations. To boost the performance, we argue that weak augmentations matter to represent a more reliable relation, and leverage momentum strategy for practical efficiency. The designed asymmetric predictor head and an InfoNCE warm-up strategy enhance the robustness to hyper-parameters and benefit the resulting performance. Experimental results show that our proposed ReSSL substantially outperforms the state-of-the-art methods across different network architectures, including various lightweight networks (\eg, EfficientNet and MobileNet).Comment: Extended version of NeurIPS 2021 paper. arXiv admin note: substantial text overlap with arXiv:2107.0928

arXiv.org e-Print Archive

Connectionist Temporal Modeling for Weakly Supervised Action Labeling

Author: A Graves
Chih-Chung Chang
DE Rumelhart
GW Taylor
JC Niebles
P Bojanowski
R Achanta
Publication venue
Publication date: 28/07/2016
Field of study

We propose a weakly-supervised framework for action labeling in video, where only the order of occurring actions is required during training time. The key challenge is that the per-frame alignments between the input (video) and label (action) sequences are unknown during training. We address this by introducing the Extended Connectionist Temporal Classification (ECTC) framework to efficiently evaluate all possible alignments via dynamic programming and explicitly enforce their consistency with frame-to-frame visual similarities. This protects the model from distractions of visually inconsistent or degenerated alignments without the need of temporal supervision. We further extend our framework to the semi-supervised case when a few frames are sparsely annotated in a video. With less than 1% of labeled frames per video, our method is able to outperform existing semi-supervised approaches and achieve comparable performance to that of fully supervised approaches.Comment: To appear in ECCV 201

arXiv.org e-Print Archive

Crossref