Search CORE

401 research outputs found

Self-supervised video pretraining yields human-aligned visual representations

Author: Carreira João
Eslami S. M. Ali
Hénaff Olivier J.
Parthasarathy Nikhil
Publication venue
Publication date: 25/07/2023
Field of study

Humans learn powerful representations of objects and scenes by observing how they evolve over time. Yet, outside of specific tasks that require explicit temporal understanding, static image pretraining remains the dominant paradigm for learning visual foundation models. We question this mismatch, and ask whether video pretraining can yield visual representations that bear the hallmarks of human perception: generalisation across tasks, robustness to perturbations, and consistency with human judgements. To that end we propose a novel procedure for curating videos, and develop a contrastive framework which learns from the complex transformations therein. This simple paradigm for distilling knowledge from videos, called VITO, yields general representations that far outperform prior video pretraining methods on image understanding tasks, and image pretraining methods on video understanding tasks. Moreover, VITO representations are significantly more robust to natural and synthetic deformations than image-, video-, and adversarially-trained ones. Finally, VITO's predictions are strongly aligned with human judgements, surpassing models that were specifically trained for that purpose. Together, these results suggest that video pretraining could be a simple way of learning unified, robust, and human-aligned representations of the visual world.Comment: Technical repor

arXiv.org e-Print Archive

Semi-supervised Tuning from Temporal Coherence

Author: Lomonaco Vincenzo
Maltoni Davide
Publication venue
Publication date: 01/01/2015
Field of study

Recent works demonstrated the usefulness of temporal coherence to regularize supervised training or to learn invariant features with deep architectures. In particular, enforcing smooth output changes while presenting temporally-closed frames from video sequences, proved to be an effective strategy. In this paper we prove the efficacy of temporal coherence for semi-supervised incremental tuning. We show that a deep architecture, just mildly trained in a supervised manner, can progressively improve its classification accuracy, if exposed to video sequences of unlabeled data. The extent to which, in some cases, a semi-supervised tuning allows to improve classification accuracy (approaching the supervised one) is somewhat surprising. A number of control experiments pointed out the fundamental role of temporal coherence.Comment: Under review as a conference paper at ICLR 201

arXiv.org e-Print Archive

Crossref

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

Representation learning with structured invariance

Author: Moskalev A.
Publication venue
Publication date: 01/01/2024
Field of study

Invariance is crucial for neural networks, enabling them to generalize effectively across variations of the input data by focusing on key attributes while filtering out irrelevant details. In this thesis, we study representation learning in neural networks through the lens of structured invariance. We start by studying the properties and limitations of the invariance that neural networks can learn from the data. Next, we develop a method to extract the structure of invariance learned by a neural network, providing a more nuanced analysis of the quality of learned invariance. In the next chapter, we focus on contrastive learning, demonstrating how more structured supervision results in a better quality of learned representations. The last two chapters that follow, focus on practical aspects of representation learning with structured invariance in computer vision

International Migration, Integration and Social Cohesion online publications

Contrastive representation learning: a framework and review

Author: Healy Graham
Le-Khac Phúc H.
Smeaton Alan F.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 16/10/2020
Field of study

Contrastive Learning has recently received interest due to its success in self-supervised representation learning in the computer vision domain. However, the origins of Contrastive Learning date as far back as the 1990s and its development has spanned across many fields and domains including Metric Learning and natural language processing. In this paper, we provide a comprehensive literature review and we propose a general Contrastive Representation Learning framework that simplifies and unifies many different contrastive learning methods. We also provide a taxonomy for each of the components of contrastive learning in order to summarise it and distinguish it from other forms of machine learning. We then discuss the inductive biases which are present in any contrastive learning system and we analyse our framework under different views from various sub-fields of Machine Learning. Examples of how contrastive learning has been applied in computer vision, natural language processing, audio processing, and others, as well as in Reinforcement Learning are also presented. Finally, we discuss the challenges and some of the most promising future research directions ahead

Irish Universities

DCU Online Research Access Service