236,321 research outputs found
Learning Representations for Human Identification
Long-duration visual tracking of people requires the ability to link track snippets (a.k.a. tracklets) based on the identity of people. In lack of the availability of motion priors or hard biometrics (e.g., face, fingerprint, or iris), the common practice is to leverage soft biometrics for matching tracklets corresponding to the same person in different sightings. A common choice is to use the whole-body visual appearance of the person, as determined by the clothing, which is assumed to not change during tracking. The problem is challenging because distinct images of the same person may look very different, since no restrictions are imposed on the nuisance factors of variation, such as pose, illumination, viewpoint, background, and sensor noise, leading to very high intra-class variances, which make this human identification task still prone to high mismatch rates.
We introduce and study models for learning representations for human identification that aim at reducing the effects of nuisance factors. First, we introduce a modeling framework based on learning a low rank representation, which can be applied to face as well as whole-body images. The goal is to not only learn invariant representations for each identity, but also to promote a uniform inter-class separation to further reduce mismatch rates. Another advantage of the approach is a fast procedure for computing and comparing invariant representations for recognition and re-identification. Second, we introduce a learning framework for fusing representations of multiple biometrics for human identification. We focus on the face modality and clothing appearance and develop a representation fusion approach based on the Information Bottleneck method.
In the last part of the dissertation, we improve person re-identification by decreasing the effects of nuisance factors via multi-task learning. We design and combine improved versions of classification and distance metric losses. Classification losses improve their performance by imposing restrictions on the computation of their outputs. This makes their training harder. We mitigate this by investigating the combination of multiple tasks, such as attribute and metric learning, that might regularize the training while improving performance. Finally, we also include the explicit modeling of nuisance factors such as pose, to further improve the invariance of representations. For each model, we show the benefits of the proposed methods by characterizing their performance based on publicly available benchmarks, and by comparing them with the state of the art
Deep Representation Learning for Vehicle Re-Identification
With the widespread use of surveillance cameras in cities and on motorways, computer vision based intelligent systems are becoming a standard in the industry. Vehicle related problems such as Automatic License Plate Recognition have been addressed by computer vision systems, albeit in controlled settings (e.g.cameras installed at toll gates). Due to the freely available research data becoming available in the last few years, surveillance footage analysis for vehicle related problems are being studied with a computer vision focus. In this thesis, vision-based approaches for the problem of vehicle re-identification are investigated and original approaches are presented for various challenges of the problem. Computer vision based systems have advanced considerably in the last decade due to rapid improvements in machine learning with the advent of deep learning and convolutional neural networks (CNNs). At the core of the paradigm shift that has arrived with deep learning in machine learning is feature learning by multiple stacked neural network layers. Compared to traditional machine learning methods that utilise hand-crafted feature extraction and shallow model learning, deep neural networks can learn hierarchical feature representations as input data transform from low-level to high-level representation through consecutive neural network layers. Furthermore, machine learning tasks are trained in an end-to-end fashion that integrates feature extraction and machine learning methods into a combined framework using neural networks. This thesis focuses on visual feature learning with deep convolutional neural networks for the vehicle re-identification problem. The problem of re-identification has attracted attention from the computer vision community, especially for the person re-identification domain, whereas vehicle re-identification is relatively understudied. Re-identification is the problem of matching identities of subjects in images. The images come from non-overlapping viewing angles captured at varying locations, illuminations, etc. Compared to person re-identification, vehicle reidentification is particularly challenging as vehicles are manufactured to have the same visual appearance and shape that makes different instances visually indistinguishable. This thesis investigates solutions for the aforementioned challenges and makes the following contributions, improving accuracy and robustness of recent approaches. The contributions are the following: (1) Exploring the man-made nature of vehicles, that is, their hierarchical categories such as type (e.g.sedan, SUV) and model (e.g.Audi-2011-A4) and its usefulness in identity matching when identity pairwise labelling is not present (2) A new vehicle re-identification benchmark, Vehicle Re-Identification in Context (VRIC), is introduced to enable the design and evaluation of vehicle re-id methods to more closely reflect real-world application conditions compared to existing benchmarks. VRIC is uniquely characterised by unconstrained vehicle images in low resolution; from wide field of view traffic scene videos exhibiting variations of illumination, motion blur,and occlusion. (3) We evaluate the advantages of Multi-Scale Visual Representation (MSVR) in multi-scale cross-camera matching performance by training a multi-branch CNN model for vehicle re-identification enabled by the availability of low resolution images in VRIC. Experimental results indicate that this approach is useful in real-world settings where image resolution is low and varying across cameras. (4) With Multi-Task Mutual Learning (MTML) we propose a multi-modal learning representation e.g.using orientation as well as identity labels in training. We utilise deep convolutional neural networks with multiple branches to facilitate the learning of multi-modal and multi-scale deep features that increase re-identification performance, as well as orientation invariant feature learning
A Deep Four-Stream Siamese Convolutional Neural Network with Joint Verification and Identification Loss for Person Re-detection
State-of-the-art person re-identification systems that employ a triplet based
deep network suffer from a poor generalization capability. In this paper, we
propose a four stream Siamese deep convolutional neural network for person
redetection that jointly optimises verification and identification losses over
a four image input group. Specifically, the proposed method overcomes the
weakness of the typical triplet formulation by using groups of four images
featuring two matched (i.e. the same identity) and two mismatched images. This
allows us to jointly increase the interclass variations and reduce the
intra-class variations in the learned feature space. The proposed approach also
optimises over both the identification and verification losses, further
minimising intra-class variation and maximising inter-class variation,
improving overall performance. Extensive experiments on four challenging
datasets, VIPeR, CUHK01, CUHK03 and PRID2011, demonstrates that the proposed
approach achieves state-of-the-art performance.Comment: Published in WACV 201
Person Re-Identification by Deep Joint Learning of Multi-Loss Classification
Existing person re-identification (re-id) methods rely mostly on either
localised or global feature representation alone. This ignores their joint
benefit and mutual complementary effects. In this work, we show the advantages
of jointly learning local and global features in a Convolutional Neural Network
(CNN) by aiming to discover correlated local and global features in different
context. Specifically, we formulate a method for joint learning of local and
global feature selection losses designed to optimise person re-id when using
only generic matching metrics such as the L2 distance. We design a novel CNN
architecture for Jointly Learning Multi-Loss (JLML) of local and global
discriminative feature optimisation subject concurrently to the same re-id
labelled information. Extensive comparative evaluations demonstrate the
advantages of this new JLML model for person re-id over a wide range of
state-of-the-art re-id methods on five benchmarks (VIPeR, GRID, CUHK01, CUHK03,
Market-1501).Comment: Accepted by IJCAI 201
Backbone Can Not be Trained at Once: Rolling Back to Pre-trained Network for Person Re-Identification
In person re-identification (ReID) task, because of its shortage of trainable
dataset, it is common to utilize fine-tuning method using a classification
network pre-trained on a large dataset. However, it is relatively difficult to
sufficiently fine-tune the low-level layers of the network due to the gradient
vanishing problem. In this work, we propose a novel fine-tuning strategy that
allows low-level layers to be sufficiently trained by rolling back the weights
of high-level layers to their initial pre-trained weights. Our strategy
alleviates the problem of gradient vanishing in low-level layers and robustly
trains the low-level layers to fit the ReID dataset, thereby increasing the
performance of ReID tasks. The improved performance of the proposed strategy is
validated via several experiments. Furthermore, without any add-ons such as
pose estimation or segmentation, our strategy exhibits state-of-the-art
performance using only vanilla deep convolutional neural network architecture.Comment: Accepted to AAAI 201
- …