999 research outputs found
Recent Advances in Transfer Learning for Cross-Dataset Visual Recognition: A Problem-Oriented Perspective
This paper takes a problem-oriented perspective and presents a comprehensive
review of transfer learning methods, both shallow and deep, for cross-dataset
visual recognition. Specifically, it categorises the cross-dataset recognition
into seventeen problems based on a set of carefully chosen data and label
attributes. Such a problem-oriented taxonomy has allowed us to examine how
different transfer learning approaches tackle each problem and how well each
problem has been researched to date. The comprehensive problem-oriented review
of the advances in transfer learning with respect to the problem has not only
revealed the challenges in transfer learning for visual recognition, but also
the problems (e.g. eight of the seventeen problems) that have been scarcely
studied. This survey not only presents an up-to-date technical review for
researchers, but also a systematic approach and a reference for a machine
learning practitioner to categorise a real problem and to look up for a
possible solution accordingly
SiT: Self-supervised vIsion Transformer
Self-supervised learning methods are gaining increasing traction in computer
vision due to their recent success in reducing the gap with supervised
learning. In natural language processing (NLP) self-supervised learning and
transformers are already the methods of choice. The recent literature suggests
that the transformers are becoming increasingly popular also in computer
vision. So far, the vision transformers have been shown to work well when
pretrained either using a large scale supervised data or with some kind of
co-supervision, e.g. in terms of teacher network. These supervised pretrained
vision transformers achieve very good results in downstream tasks with minimal
changes. In this work we investigate the merits of self-supervised learning for
pretraining image/vision transformers and then using them for downstream
classification tasks. We propose Self-supervised vIsion Transformers (SiT) and
discuss several self-supervised training mechanisms to obtain a pretext model.
The architectural flexibility of SiT allows us to use it as an autoencoder and
work with multiple self-supervised tasks seamlessly. We show that a pretrained
SiT can be finetuned for a downstream classification task on small scale
datasets, consisting of a few thousand images rather than several millions. The
proposed approach is evaluated on standard datasets using common protocols. The
results demonstrate the strength of the transformers and their suitability for
self-supervised learning. We outperformed existing self-supervised learning
methods by large margin. We also observed that SiT is good for few shot
learning and also showed that it is learning useful representation by simply
training a linear classifier on top of the learned features from SiT.
Pretraining, finetuning, and evaluation codes will be available under:
https://github.com/Sara-Ahmed/SiT
Action classification using a discriminative multilevel HDP-HMM
We classify human actions occurring in depth image sequences using features based on skeletal joint positions. The action classes are represented by a multi-level Hierarchical Dirichlet Process – Hidden Markov Model (HDP-HMM). The non-parametric HDP-HMM allows the inference of hidden states automatically from training data. The model parameters of each class are formulated as transformations from a shared base distribution, thus promoting the use of unlabelled examples during training and borrowing information across action classes. Further, the parameters are learnt in a discriminative way. We use a normalized gamma process representation of HDP and margin based likelihood functions for this purpose. We sample parameters from the complex posterior distribution induced by our discriminative likelihood function using elliptical slice sampling. Experiments with two different datasets show that action class models learnt using our technique produce good classification results
Tracklet Self-Supervised Learning for Unsupervised Person Re-Identification
Existing unsupervised person re-identification (re-id) methods mainly focus on cross-domain adaptation or one-shot learning. Although they are more scalable than the supervised learning counterparts, relying on a relevant labelled source domain or one labelled tracklet per person initialisation still restricts their scalability in real-world deployments. To alleviate these problems, some recent studies develop unsupervised tracklet association and bottom-up image clustering methods, but they still rely on explicit camera annotation or merely utilise suboptimal global clustering. In this work, we formulate a novel tracklet self-supervised learning (TSSL) method, which is capable of capitalising directly from abundant unlabelled tracklet data, to optimise a feature embedding space for both video and image unsupervised re-id. This is achieved by designing a comprehensive unsupervised learning objective that accounts for tracklet frame coherence, tracklet neighbourhood compactness, and tracklet cluster structure in a unified formulation. As a pure unsupervised learning re-id model, TSSL is end-to-end trainable at the absence of source data annotation, person identity labels, and camera prior knowledge. Extensive experiments demonstrate the superiority of TSSL over a wide variety of the state-of-the-art alternative methods on four large-scale person re-id benchmarks, including Market-1501, DukeMTMC-ReID, MARS and DukeMTMC-VideoReID
MLAN: Multi-Level Adversarial Network for Domain Adaptive Semantic Segmentation
Recent progresses in domain adaptive semantic segmentation demonstrate the
effectiveness of adversarial learning (AL) in unsupervised domain adaptation.
However, most adversarial learning based methods align source and target
distributions at a global image level but neglect the inconsistency around
local image regions. This paper presents a novel multi-level adversarial
network (MLAN) that aims to address inter-domain inconsistency at both global
image level and local region level optimally. MLAN has two novel designs,
namely, region-level adversarial learning (RL-AL) and co-regularized
adversarial learning (CR-AL). Specifically, RL-AL models prototypical regional
context-relations explicitly in the feature space of a labelled source domain
and transfers them to an unlabelled target domain via adversarial learning.
CR-AL fuses region-level AL and image-level AL optimally via mutual
regularization. In addition, we design a multi-level consistency map that can
guide domain adaptation in both input space (, image-to-image
translation) and output space (, self-training) effectively. Extensive
experiments show that MLAN outperforms the state-of-the-art with a large margin
consistently across multiple datasets.Comment: Submitted to P
Semi-generative modelling: learning with cause and effect features
We consider a case of covariate shift where prior causal inference or expert knowledge has identified some features as effects, and show how this setting, when analysed from a causal perspective, gives rise to a semi-generative modelling framework: P(Y,X_eff|Xcau)
- …