117 research outputs found
Unsupervised Cross-Dataset Transfer Learning for Person Re-identification
Most existing person re-identification (Re-ID) approaches follow a supervised learning framework, in which a large number of labelled matching pairs are required for training. This severely limits their scalability in real-world applications. To overcome this limitation, we develop a novel cross-dataset transfer learning approach to learn a discriminative representation. It is unsupervised in the sense that the target dataset is completely unlabelled. Specifically, we present an multi-task dictionary learning method which is able to learn a dataset-shared but target-data-biased representation. Experimental results on five benchmark datasets demonstrate that the method significantly outperforms the state-of-the-art.National Basic Research Program of China [2015CB351806]; National Natural Science Foundation of China [61425025, 61390515, 61471042, 61421062]; National Key Technology Research and Development Program [2014BAK10B02]; Shenzhen Peacock PlanCPCI-S(ISTP)1306-131
Joint Semantic and Latent Attribute Modelling for Cross-Class Transfer Learning
This work is partially supported by grants from the
National Natural Science Foundation of China under
contract No. 61390515, No. U1611461, and No.
61425025, and the National Basic Research Program
of China under Grant No. 2015CB351806
Cross-class Transfer Learning for Visual Data
PhDAutomatic analysis of visual data is a key objective of computer vision research; and performing
visual recognition of objects from images is one of the most important steps towards understanding
and gaining insights into the visual data. Most existing approaches in the literature for the
visual recognition are based on a supervised learning paradigm. Unfortunately, they require a
large amount of labelled training data which severely limits their scalability. On the other hand,
recognition is instantaneous and effortless for humans. They can recognise a new object without
seeing any visual samples by just knowing the description of it, leveraging similarities between
the description of the new object and previously learned concepts. Motivated by humans recognition
ability, this thesis proposes novel approaches to tackle cross-class transfer learning (crossclass
recognition) problem whose goal is to learn a model from seen classes (those with labelled
training samples) that can generalise to unseen classes (those with labelled testing samples) without
any training data i.e., seen and unseen classes are disjoint. Specifically, the thesis studies and
develops new methods for addressing three variants of the cross-class transfer learning:
Chapter 3 The first variant is transductive cross-class transfer learning, meaning labelled
training set and unlabelled test set are available for model learning. Considering training set
as the source domain and test set as the target domain, a typical cross-class transfer learning
assumes that the source and target domains share a common semantic space, where visual feature
vector extracted from an image can be embedded using an embedding function. Existing
approaches learn this function from the source domain and apply it without adaptation to the
target one. They are therefore prone to the domain shift problem i.e., the embedding function
is only concerned with predicting the training seen class semantic representation in the learning
stage during learning, when applied to the test data it may underperform. In this thesis, a novel
cross-class transfer learning (CCTL) method is proposed based on unsupervised domain adaptation.
Specifically, a novel regularised dictionary learning framework is formulated by which the
target class labels are used to regularise the learned target domain embeddings thus effectively
overcoming the projection domain shift problem.
Chapter 4 The second variant is inductive cross-class transfer learning, that is, only training
set is assumed to be available during model learning, resulting in a harder challenge compared
to the previous one. Nevertheless, this setting reflects a real-world setting in which test data is
available after the model learning. The main problem remains the same as the previous variant,
that is, the domain shift problem occurs when the model learned only from the training set is applied
to the test set without adaptation. In this thesis, a semantic autoencoder (SAE) is proposed
building on an encoder-decoder paradigm. Specifically, first a semantic space is defined so that
knowledge transfer is possible from the seen classes to the unseen classes. Then, an encoder aims
to embed/project a visual feature vector into the semantic space. However, the decoder exerts a
generative task, that is, the projection must be able to reconstruct the original visual features. The
generative task forces the encoder to preserve richer information, thus the learned encoder from
seen classes is able generalise better to the new unseen classes.
Chapter 5 The third one is unsupervised cross-class transfer learning. In this variant, no
supervision is available for model learning i.e., only unlabelled training data is available, leading
to the hardest setting compared to the previous cases. The goal, however, is the same, learning
some knowledge from the training data that can be transferred to the test data composed of
completely different labels from that of training data. The thesis proposes a novel approach which
requires no labelled training data yet is able to capture discriminative information. The proposed
model is based on a new graph regularised dictionary learning algorithm. By introducing a l1-
norm graph regularisation term, instead of the conventional squared l2-norm, the model is robust
against outliers and noises typical in visual data. Importantly, the graph and representation are
learned jointly, resulting in further alleviation of the effects of data outliers. As an application,
person re-identification is considered for this variant in this thesis
Unsupervised Adaptive Re-identification in Open World Dynamic Camera Networks
Person re-identification is an open and challenging problem in computer
vision. Existing approaches have concentrated on either designing the best
feature representation or learning optimal matching metrics in a static setting
where the number of cameras are fixed in a network. Most approaches have
neglected the dynamic and open world nature of the re-identification problem,
where a new camera may be temporarily inserted into an existing system to get
additional information. To address such a novel and very practical problem, we
propose an unsupervised adaptation scheme for re-identification models in a
dynamic camera network. First, we formulate a domain perceptive
re-identification method based on geodesic flow kernel that can effectively
find the best source camera (already installed) to adapt with a newly
introduced target camera, without requiring a very expensive training phase.
Second, we introduce a transitive inference algorithm for re-identification
that can exploit the information from best source camera to improve the
accuracy across other camera pairs in a network of multiple cameras. Extensive
experiments on four benchmark datasets demonstrate that the proposed approach
significantly outperforms the state-of-the-art unsupervised learning based
alternatives whilst being extremely efficient to compute.Comment: CVPR 2017 Spotligh
CANU-ReID: A Conditional Adversarial Network for Unsupervised person Re-IDentification
Unsupervised person re-ID is the task of identifying people on a target data
set for which the ID labels are unavailable during training. In this paper, we
propose to unify two trends in unsupervised person re-ID: clustering &
fine-tuning and adversarial learning. On one side, clustering groups training
images into pseudo-ID labels, and uses them to fine-tune the feature extractor.
On the other side, adversarial learning is used, inspired by domain adaptation,
to match distributions from different domains. Since target data is distributed
across different camera viewpoints, we propose to model each camera as an
independent domain, and aim to learn domain-independent features.
Straightforward adversarial learning yields negative transfer, we thus
introduce a conditioning vector to mitigate this undesirable effect. In our
framework, the centroid of the cluster to which the visual sample belongs is
used as conditioning vector of our conditional adversarial network, where the
vector is permutation invariant (clusters ordering does not matter) and its
size is independent of the number of clusters. To our knowledge, we are the
first to propose the use of conditional adversarial networks for unsupervised
person re-ID. We evaluate the proposed architecture on top of two
state-of-the-art clustering-based unsupervised person re-identification (re-ID)
methods on four different experimental settings with three different data sets
and set the new state-of-the-art performance on all four of them. Our code and
model will be made publicly available at
https://team.inria.fr/perception/canu-reid/
- …