49 research outputs found
Unsupervised Adaptive Re-identification in Open World Dynamic Camera Networks
Person re-identification is an open and challenging problem in computer
vision. Existing approaches have concentrated on either designing the best
feature representation or learning optimal matching metrics in a static setting
where the number of cameras are fixed in a network. Most approaches have
neglected the dynamic and open world nature of the re-identification problem,
where a new camera may be temporarily inserted into an existing system to get
additional information. To address such a novel and very practical problem, we
propose an unsupervised adaptation scheme for re-identification models in a
dynamic camera network. First, we formulate a domain perceptive
re-identification method based on geodesic flow kernel that can effectively
find the best source camera (already installed) to adapt with a newly
introduced target camera, without requiring a very expensive training phase.
Second, we introduce a transitive inference algorithm for re-identification
that can exploit the information from best source camera to improve the
accuracy across other camera pairs in a network of multiple cameras. Extensive
experiments on four benchmark datasets demonstrate that the proposed approach
significantly outperforms the state-of-the-art unsupervised learning based
alternatives whilst being extremely efficient to compute.Comment: CVPR 2017 Spotligh
Domain Adaptive Attention Model for Unsupervised Cross-Domain Person Re-Identification
Person re-identification (Re-ID) across multiple datasets is a challenging
yet important task due to the possibly large distinctions between different
datasets and the lack of training samples in practical applications. This work
proposes a novel unsupervised domain adaption framework which transfers
discriminative representations from the labeled source domain (dataset) to the
unlabeled target domain (dataset). We propose to formulate the domain adaption
task as an one-class classification problem with a novel domain similarity
loss. Given the feature map of any image from a backbone network, a novel
domain adaptive attention model (DAAM) first automatically learns to separate
the feature map of an image to a domain-shared feature (DSH) map and a
domain-specific feature (DSP) map simultaneously. Specially, the residual
attention mechanism is designed to model DSP feature map for avoiding negative
transfer. Then, a DSH branch and a DSP branch are introduced to learn DSH and
DSP feature maps respectively. To reduce domain divergence caused by that the
source and target datasets are collected from different environments, we force
to project the DSH feature maps from different domains to a new nominal domain,
and a novel domain similarity loss is proposed based on one-class
classification. In addition, a novel unsupervised person Re-ID loss is proposed
to take full use of unlabeled target data. Extensive experiments on the
Market-1501 and DukeMTMC-reID benchmarks demonstrate state-of-the-art
performance of the proposed method. Code will be released to facilitate further
studies on the cross-domain person re-identification task
Multi-Domain Adversarial Feature Generalization for Person Re-Identification
With the assistance of sophisticated training methods applied to single
labeled datasets, the performance of fully-supervised person re-identification
(Person Re-ID) has been improved significantly in recent years. However, these
models trained on a single dataset usually suffer from considerable performance
degradation when applied to videos of a different camera network. To make
Person Re-ID systems more practical and scalable, several cross-dataset domain
adaptation methods have been proposed, which achieve high performance without
the labeled data from the target domain. However, these approaches still
require the unlabeled data of the target domain during the training process,
making them impractical. A practical Person Re-ID system pre-trained on other
datasets should start running immediately after deployment on a new site
without having to wait until sufficient images or videos are collected and the
pre-trained model is tuned. To serve this purpose, in this paper, we
reformulate person re-identification as a multi-dataset domain generalization
problem. We propose a multi-dataset feature generalization network (MMFA-AAE),
which is capable of learning a universal domain-invariant feature
representation from multiple labeled datasets and generalizing it to `unseen'
camera systems. The network is based on an adversarial auto-encoder to learn a
generalized domain-invariant latent feature representation with the Maximum
Mean Discrepancy (MMD) measure to align the distributions across multiple
domains. Extensive experiments demonstrate the effectiveness of the proposed
method. Our MMFA-AAE approach not only outperforms most of the domain
generalization Person Re-ID methods, but also surpasses many state-of-the-art
supervised methods and unsupervised domain adaptation methods by a large
margin.Comment: TIP (Accept with Mandatory Minor Revisions
Progressive Cross-camera Soft-label Learning for Semi-supervised Person Re-identification
In this paper, we focus on the semi-supervised person re-identification
(Re-ID) case, which only has the intra-camera (within-camera) labels but not
inter-camera (cross-camera) labels. In real-world applications, these
intra-camera labels can be readily captured by tracking algorithms or few
manual annotations, when compared with cross-camera labels. In this case, it is
very difficult to explore the relationships between cross-camera persons in the
training stage due to the lack of cross-camera label information. To deal with
this issue, we propose a novel Progressive Cross-camera Soft-label Learning
(PCSL) framework for the semi-supervised person Re-ID task, which can generate
cross-camera soft-labels and utilize them to optimize the network. Concretely,
we calculate an affinity matrix based on person-level features and adapt them
to produce the similarities between cross-camera persons (i.e., cross-camera
soft-labels). To exploit these soft-labels to train the network, we investigate
the weighted cross-entropy loss and the weighted triplet loss from the
classification and discrimination perspectives, respectively. Particularly, the
proposed framework alternately generates progressive cross-camera soft-labels
and gradually improves feature representations in the whole learning course.
Extensive experiments on five large-scale benchmark datasets show that PCSL
significantly outperforms the state-of-the-art unsupervised methods that employ
labeled source domains or the images generated by the GAN-based models.
Furthermore, the proposed method even has a competitive performance with
respect to deep supervised Re-ID methods.Comment: Accepted by IEEE Transactions on Circuits and Systems for Video
Technology (TCSVT
A Novel Unsupervised Camera-aware Domain Adaptation Framework for Person Re-identification
Unsupervised cross-domain person re-identification (Re-ID) faces two key
issues. One is the data distribution discrepancy between source and target
domains, and the other is the lack of labelling information in target domain.
They are addressed in this paper from the perspective of representation
learning. For the first issue, we highlight the presence of camera-level
sub-domains as a unique characteristic of person Re-ID, and develop
camera-aware domain adaptation to reduce the discrepancy not only between
source and target domains but also across these sub-domains. For the second
issue, we exploit the temporal continuity in each camera of target domain to
create discriminative information. This is implemented by dynamically
generating online triplets within each batch, in order to maximally take
advantage of the steadily improved feature representation in training process.
Together, the above two methods give rise to a novel unsupervised deep domain
adaptation framework for person Re-ID. Experiments and ablation studies on
benchmark datasets demonstrate its superiority and interesting properties.Comment: Accepted by ICCV201
Transferable Joint Attribute-Identity Deep Learning for Unsupervised Person Re-Identification
Most existing person re-identification (re-id) methods require supervised
model learning from a separate large set of pairwise labelled training data for
every single camera pair. This significantly limits their scalability and
usability in real-world large scale deployments with the need for performing
re-id across many camera views. To address this scalability problem, we develop
a novel deep learning method for transferring the labelled information of an
existing dataset to a new unseen (unlabelled) target domain for person re-id
without any supervised learning in the target domain. Specifically, we
introduce an Transferable Joint Attribute-Identity Deep Learning (TJ-AIDL) for
simultaneously learning an attribute-semantic and identitydiscriminative
feature representation space transferrable to any new (unseen) target domain
for re-id tasks without the need for collecting new labelled training data from
the target domain (i.e. unsupervised learning in the target domain). Extensive
comparative evaluations validate the superiority of this new TJ-AIDL model for
unsupervised person re-id over a wide range of state-of-the-art methods on four
challenging benchmarks including VIPeR, PRID, Market-1501, and DukeMTMC-ReID.Comment: Accepted at CVPR 201
Data-driven pedestrian re-identification based on hierarchical semantic representation
Limited number of labeled data of surveillance video causes the training of supervised model for pedestrian re-identification to be a difficult task. Besides, applications of pedestrian re-identification in pedestrian retrieving and criminal tracking are limited because of the lack of semantic representation. In this paper, a data-driven pedestrian re-identification model based on hierarchical semantic representation is proposed, extracting essential features with unsupervised deep learning model and enhancing the semantic representation of features with hierarchical mid-level ‘attributes’.
Firstly, CNNs, well-trained with the training process of CAEs, is used to extract features of horizontal blocks segmented from unlabeled pedestrian images. Then, these features are input into corresponding attribute classifiers to judge whether the pedestrian has the attributes. Lastly, with a table of ‘attributes-classes mapping relations’, final result can be calculated. Under the premise of improving the accuracy of attribute classifier, our qualitative results show its clear advantages over the CHUK02, VIPeR, and i-LIDS data set. Our proposed method is proved to effectively solve the problem of dependency on labeled data and lack of semantic expression, and it also significantly outperforms the state-of-the-art in terms of accuracy and semanteme