1,753 research outputs found
Self-Supervised Representation Learning for Detection of ACL Tear Injury in Knee MR Videos
The success of deep learning based models for computer vision applications
requires large scale human annotated data which are often expensive to
generate. Self-supervised learning, a subset of unsupervised learning, handles
this problem by learning meaningful features from unlabeled image or video
data. In this paper, we propose a self-supervised learning approach to learn
transferable features from MR video clips by enforcing the model to learn
anatomical features. The pretext task models are designed to predict the
correct ordering of the jumbled image patches that the MR video frames are
divided into. To the best of our knowledge, none of the supervised learning
models performing injury classification task from MR video provide any
explanation for the decisions made by the models and hence makes our work the
first of its kind on MR video data. Experiments on the pretext task show that
this proposed approach enables the model to learn spatial context invariant
features which help for reliable and explainable performance in downstream
tasks like classification of Anterior Cruciate Ligament tear injury from knee
MRI. The efficiency of the novel Convolutional Neural Network proposed in this
paper is reflected in the experimental results obtained in the downstream task
Self-supervised Co-training for Video Representation Learning
The objective of this paper is visual-only self-supervised video
representation learning. We make the following contributions: (i) we
investigate the benefit of adding semantic-class positives to instance-based
Info Noise Contrastive Estimation (InfoNCE) training, showing that this form of
supervised contrastive learning leads to a clear improvement in performance;
(ii) we propose a novel self-supervised co-training scheme to improve the
popular infoNCE loss, exploiting the complementary information from different
views, RGB streams and optical flow, of the same data source by using one view
to obtain positive class samples for the other; (iii) we thoroughly evaluate
the quality of the learnt representation on two different downstream tasks:
action recognition and video retrieval. In both cases, the proposed approach
demonstrates state-of-the-art or comparable performance with other
self-supervised approaches, whilst being significantly more efficient to train,
i.e. requiring far less training data to achieve similar performance.Comment: NeurIPS202
- …