40 research outputs found
Highly Efficient Regression for Scalable Person Re-Identification
Existing person re-identification models are poor for scaling up to large
data required in real-world applications due to: (1) Complexity: They employ
complex models for optimal performance resulting in high computational cost for
training at a large scale; (2) Inadaptability: Once trained, they are
unsuitable for incremental update to incorporate any new data available. This
work proposes a truly scalable solution to re-id by addressing both problems.
Specifically, a Highly Efficient Regression (HER) model is formulated by
embedding the Fisher's criterion to a ridge regression model for very fast
re-id model learning with scalable memory/storage usage. Importantly, this new
HER model supports faster than real-time incremental model updates therefore
making real-time active learning feasible in re-id with human-in-the-loop.
Extensive experiments show that such a simple and fast model not only
outperforms notably the state-of-the-art re-id methods, but also is more
scalable to large data with additional benefits to active learning for reducing
human labelling effort in re-id deployment
Adaptive Locality Preserving Regression
This paper proposes a novel discriminative regression method, called adaptive
locality preserving regression (ALPR) for classification. In particular, ALPR
aims to learn a more flexible and discriminative projection that not only
preserves the intrinsic structure of data, but also possesses the properties of
feature selection and interpretability. To this end, we introduce a target
learning technique to adaptively learn a more discriminative and flexible
target matrix rather than the pre-defined strict zero-one label matrix for
regression. Then a locality preserving constraint regularized by the adaptive
learned weights is further introduced to guide the projection learning, which
is beneficial to learn a more discriminative projection and avoid overfitting.
Moreover, we replace the conventional `Frobenius norm' with the special l21
norm to constrain the projection, which enables the method to adaptively select
the most important features from the original high-dimensional data for feature
extraction. In this way, the negative influence of the redundant features and
noises residing in the original data can be greatly eliminated. Besides, the
proposed method has good interpretability for features owing to the
row-sparsity property of the l21 norm. Extensive experiments conducted on the
synthetic database with manifold structure and many real-world databases prove
the effectiveness of the proposed method.Comment: The paper has been accepted by IEEE Transactions on Circuits and
Systems for Video Technology (TCSVT), and the code can be available at
https://drive.google.com/file/d/1iNzONkRByIaUhXwdEhOkkh_0d2AAXNE8/vie
Person Re-identification in Identity Regression Space
This work was partially supported by the China Scholarship Council, Vision Semantics Ltd, Royal Society Newton Advanced Fellowship Programme (NA150459), and Innovate UK Industrial Challenge Project on Developing and Commercialising Intelligent Video Analytics Solutions for Public Safety (98111-571149)
Sparse Modeling for Image and Vision Processing
In recent years, a large amount of multi-disciplinary research has been
conducted on sparse models and their applications. In statistics and machine
learning, the sparsity principle is used to perform model selection---that is,
automatically selecting a simple model among a large collection of them. In
signal processing, sparse coding consists of representing data with linear
combinations of a few dictionary elements. Subsequently, the corresponding
tools have been widely adopted by several scientific communities such as
neuroscience, bioinformatics, or computer vision. The goal of this monograph is
to offer a self-contained view of sparse modeling for visual recognition and
image processing. More specifically, we focus on applications where the
dictionary is learned and adapted to data, yielding a compact representation
that has been successful in various contexts.Comment: 205 pages, to appear in Foundations and Trends in Computer Graphics
and Visio
Cross-class Transfer Learning for Visual Data
PhDAutomatic analysis of visual data is a key objective of computer vision research; and performing
visual recognition of objects from images is one of the most important steps towards understanding
and gaining insights into the visual data. Most existing approaches in the literature for the
visual recognition are based on a supervised learning paradigm. Unfortunately, they require a
large amount of labelled training data which severely limits their scalability. On the other hand,
recognition is instantaneous and effortless for humans. They can recognise a new object without
seeing any visual samples by just knowing the description of it, leveraging similarities between
the description of the new object and previously learned concepts. Motivated by humans recognition
ability, this thesis proposes novel approaches to tackle cross-class transfer learning (crossclass
recognition) problem whose goal is to learn a model from seen classes (those with labelled
training samples) that can generalise to unseen classes (those with labelled testing samples) without
any training data i.e., seen and unseen classes are disjoint. Specifically, the thesis studies and
develops new methods for addressing three variants of the cross-class transfer learning:
Chapter 3 The first variant is transductive cross-class transfer learning, meaning labelled
training set and unlabelled test set are available for model learning. Considering training set
as the source domain and test set as the target domain, a typical cross-class transfer learning
assumes that the source and target domains share a common semantic space, where visual feature
vector extracted from an image can be embedded using an embedding function. Existing
approaches learn this function from the source domain and apply it without adaptation to the
target one. They are therefore prone to the domain shift problem i.e., the embedding function
is only concerned with predicting the training seen class semantic representation in the learning
stage during learning, when applied to the test data it may underperform. In this thesis, a novel
cross-class transfer learning (CCTL) method is proposed based on unsupervised domain adaptation.
Specifically, a novel regularised dictionary learning framework is formulated by which the
target class labels are used to regularise the learned target domain embeddings thus effectively
overcoming the projection domain shift problem.
Chapter 4 The second variant is inductive cross-class transfer learning, that is, only training
set is assumed to be available during model learning, resulting in a harder challenge compared
to the previous one. Nevertheless, this setting reflects a real-world setting in which test data is
available after the model learning. The main problem remains the same as the previous variant,
that is, the domain shift problem occurs when the model learned only from the training set is applied
to the test set without adaptation. In this thesis, a semantic autoencoder (SAE) is proposed
building on an encoder-decoder paradigm. Specifically, first a semantic space is defined so that
knowledge transfer is possible from the seen classes to the unseen classes. Then, an encoder aims
to embed/project a visual feature vector into the semantic space. However, the decoder exerts a
generative task, that is, the projection must be able to reconstruct the original visual features. The
generative task forces the encoder to preserve richer information, thus the learned encoder from
seen classes is able generalise better to the new unseen classes.
Chapter 5 The third one is unsupervised cross-class transfer learning. In this variant, no
supervision is available for model learning i.e., only unlabelled training data is available, leading
to the hardest setting compared to the previous cases. The goal, however, is the same, learning
some knowledge from the training data that can be transferred to the test data composed of
completely different labels from that of training data. The thesis proposes a novel approach which
requires no labelled training data yet is able to capture discriminative information. The proposed
model is based on a new graph regularised dictionary learning algorithm. By introducing a l1-
norm graph regularisation term, instead of the conventional squared l2-norm, the model is robust
against outliers and noises typical in visual data. Importantly, the graph and representation are
learned jointly, resulting in further alleviation of the effects of data outliers. As an application,
person re-identification is considered for this variant in this thesis