PhDAutomatic analysis of visual data is a key objective of computer vision research; and performing
visual recognition of objects from images is one of the most important steps towards understanding
and gaining insights into the visual data. Most existing approaches in the literature for the
visual recognition are based on a supervised learning paradigm. Unfortunately, they require a
large amount of labelled training data which severely limits their scalability. On the other hand,
recognition is instantaneous and effortless for humans. They can recognise a new object without
seeing any visual samples by just knowing the description of it, leveraging similarities between
the description of the new object and previously learned concepts. Motivated by humans recognition
ability, this thesis proposes novel approaches to tackle cross-class transfer learning (crossclass
recognition) problem whose goal is to learn a model from seen classes (those with labelled
training samples) that can generalise to unseen classes (those with labelled testing samples) without
any training data i.e., seen and unseen classes are disjoint. Specifically, the thesis studies and
develops new methods for addressing three variants of the cross-class transfer learning:
Chapter 3 The first variant is transductive cross-class transfer learning, meaning labelled
training set and unlabelled test set are available for model learning. Considering training set
as the source domain and test set as the target domain, a typical cross-class transfer learning
assumes that the source and target domains share a common semantic space, where visual feature
vector extracted from an image can be embedded using an embedding function. Existing
approaches learn this function from the source domain and apply it without adaptation to the
target one. They are therefore prone to the domain shift problem i.e., the embedding function
is only concerned with predicting the training seen class semantic representation in the learning
stage during learning, when applied to the test data it may underperform. In this thesis, a novel
cross-class transfer learning (CCTL) method is proposed based on unsupervised domain adaptation.
Specifically, a novel regularised dictionary learning framework is formulated by which the
target class labels are used to regularise the learned target domain embeddings thus effectively
overcoming the projection domain shift problem.
Chapter 4 The second variant is inductive cross-class transfer learning, that is, only training
set is assumed to be available during model learning, resulting in a harder challenge compared
to the previous one. Nevertheless, this setting reflects a real-world setting in which test data is
available after the model learning. The main problem remains the same as the previous variant,
that is, the domain shift problem occurs when the model learned only from the training set is applied
to the test set without adaptation. In this thesis, a semantic autoencoder (SAE) is proposed
building on an encoder-decoder paradigm. Specifically, first a semantic space is defined so that
knowledge transfer is possible from the seen classes to the unseen classes. Then, an encoder aims
to embed/project a visual feature vector into the semantic space. However, the decoder exerts a
generative task, that is, the projection must be able to reconstruct the original visual features. The
generative task forces the encoder to preserve richer information, thus the learned encoder from
seen classes is able generalise better to the new unseen classes.
Chapter 5 The third one is unsupervised cross-class transfer learning. In this variant, no
supervision is available for model learning i.e., only unlabelled training data is available, leading
to the hardest setting compared to the previous cases. The goal, however, is the same, learning
some knowledge from the training data that can be transferred to the test data composed of
completely different labels from that of training data. The thesis proposes a novel approach which
requires no labelled training data yet is able to capture discriminative information. The proposed
model is based on a new graph regularised dictionary learning algorithm. By introducing a l1-
norm graph regularisation term, instead of the conventional squared l2-norm, the model is robust
against outliers and noises typical in visual data. Importantly, the graph and representation are
learned jointly, resulting in further alleviation of the effects of data outliers. As an application,
person re-identification is considered for this variant in this thesis