14,575 research outputs found
Intra-Camera Supervised Person Re-Identification
Existing person re-identification (re-id) methods mostly exploit a large set of cross-camera identity labelled training data. This requires a tedious data collection and annotation process, leading to poor scalability in practical re-id applications. On the other hand unsupervised re-id methods do not need identity label information, but they usually suffer from much inferior and insufficient model performance. To overcome these fundamental limitations, we propose a novel person re-identification paradigm based on an idea of independent per-camera identity annotation. This eliminates the most time-consuming and tedious inter-camera identity labelling process, significantly reducing the amount of human annotation efforts. Consequently, it gives rise to a more scalable and more feasible setting, which we call Intra-Camera Supervised (ICS) person re-id, for which we formulate a Multi-tAsk mulTi-labEl (MATE) deep learning method. Specifically, MATE is designed for self-discovering the cross-camera identity correspondence in a per-camera multi-task inference framework. Extensive experiments demonstrate the cost-effectiveness superiority of our method over the alternative approaches on three large person re-id datasets. For example, MATE yields 88.7% rank-1 score on Market-1501 in the proposed ICS person re-id setting, significantly outperforming unsupervised learning models and closely approaching conventional fully supervised learning competitors
Intra-Camera Supervised Person Re-Identification: A New Benchmark
Existing person re-identification (re-id) methods rely mostly on a large set
of inter-camera identity labelled training data, requiring a tedious data
collection and annotation process therefore leading to poor scalability in
practical re-id applications. To overcome this fundamental limitation, we
consider person re-identification without inter-camera identity association but
only with identity labels independently annotated within each individual
camera-view. This eliminates the most time-consuming and tedious inter-camera
identity labelling process in order to significantly reduce the amount of human
efforts required during annotation. It hence gives rise to a more scalable and
more feasible learning scenario, which we call Intra-Camera Supervised (ICS)
person re-id. Under this ICS setting with weaker label supervision, we
formulate a Multi-Task Multi-Label (MTML) deep learning method. Given no
inter-camera association, MTML is specially designed for self-discovering the
inter-camera identity correspondence. This is achieved by inter-camera
multi-label learning under a joint multi-task inference framework. In addition,
MTML can also efficiently learn the discriminative re-id feature
representations by fully using the available identity labels within each
camera-view. Extensive experiments demonstrate the performance superiority of
our MTML model over the state-of-the-art alternative methods on three
large-scale person re-id datasets in the proposed intra-camera supervised
learning setting.Comment: 9 pages, 3 figures, accepted by ICCV Workshop on Real-World
Recognition from Low-Quality Images and Videos, 201
Progressive Cross-camera Soft-label Learning for Semi-supervised Person Re-identification
In this paper, we focus on the semi-supervised person re-identification
(Re-ID) case, which only has the intra-camera (within-camera) labels but not
inter-camera (cross-camera) labels. In real-world applications, these
intra-camera labels can be readily captured by tracking algorithms or few
manual annotations, when compared with cross-camera labels. In this case, it is
very difficult to explore the relationships between cross-camera persons in the
training stage due to the lack of cross-camera label information. To deal with
this issue, we propose a novel Progressive Cross-camera Soft-label Learning
(PCSL) framework for the semi-supervised person Re-ID task, which can generate
cross-camera soft-labels and utilize them to optimize the network. Concretely,
we calculate an affinity matrix based on person-level features and adapt them
to produce the similarities between cross-camera persons (i.e., cross-camera
soft-labels). To exploit these soft-labels to train the network, we investigate
the weighted cross-entropy loss and the weighted triplet loss from the
classification and discrimination perspectives, respectively. Particularly, the
proposed framework alternately generates progressive cross-camera soft-labels
and gradually improves feature representations in the whole learning course.
Extensive experiments on five large-scale benchmark datasets show that PCSL
significantly outperforms the state-of-the-art unsupervised methods that employ
labeled source domains or the images generated by the GAN-based models.
Furthermore, the proposed method even has a competitive performance with
respect to deep supervised Re-ID methods.Comment: Accepted by IEEE Transactions on Circuits and Systems for Video
Technology (TCSVT
Learning Intra and Inter-Camera Invariance for Isolated Camera Supervised Person Re-identification
Supervised person re-identification assumes that a person has images captured
under multiple cameras. However when cameras are placed in distance, a person
rarely appears in more than one camera. This paper thus studies person re-ID
under such isolated camera supervised (ISCS) setting. Instead of trying to
generate fake cross-camera features like previous methods, we explore a novel
perspective by making efficient use of the variation in training data. Under
ISCS setting, a person only has limited images from a single camera, so the
camera bias becomes a critical issue confounding ID discrimination.
Cross-camera images are prone to being recognized as different IDs simply by
camera style. To eliminate the confounding effect of camera bias, we propose to
learn both intra- and inter-camera invariance under a unified framework. First,
we construct style-consistent environments via clustering, and perform
prototypical contrastive learning within each environment. Meanwhile, strongly
augmented images are contrasted with original prototypes to enforce
intra-camera augmentation invariance. For inter-camera invariance, we further
design a much improved variant of multi-camera negative loss that optimizes the
distance of multi-level negatives. The resulting model learns to be invariant
to both subtle and severe style variation within and cross-camera. On multiple
benchmarks, we conduct extensive experiments and validate the effectiveness and
superiority of the proposed method. Code will be available at
https://github.com/Terminator8758/IICI.Comment: ACM MultiMedia 202
Learning Discriminative Features for Person Re-Identification
For fulfilling the requirements of public safety in modern cities, more and more large-scale surveillance camera systems are deployed, resulting in an enormous amount of visual data. Automatically processing and interpreting these data promote the development and application of visual data analytic technologies. As one of the important research topics in surveillance systems, person re-identification (re-id) aims at retrieving the target person across non-overlapping camera-views that are implemented in a number of distributed space-time locations. It is a fundamental problem for many practical surveillance applications, eg, person search, cross-camera tracking, multi-camera human behavior analysis and prediction, and it received considerable attentions nowadays from both academic and industrial domains.
Learning discriminative feature representation is an essential task in person re-id. Although many methodologies have been proposed, discriminative re-id feature extraction is still a challenging problem due to: (1) Intra- and inter-personal variations. The intrinsic properties of the camera deployment in surveillance system lead to various changes in person poses, view-points, illumination conditions etc. This may result in the large intra-personal variations and/or small inter-personal variations, thus incurring problems in matching person images. (2) Domain variations. The domain variations between different datasets give rise to the problem of generalization capability of re-id model. Directly applying a re-id model trained on one dataset to another one usually causes a large performance degradation. (3) Difficulties in data creation and annotation. Existing person re-id methods, especially deep re-id methods, rely mostly on a large set of inter-camera identity labelled training data, requiring a tedious data collection and annotation process. This leads to poor scalability in practical person re-id applications.
Corresponding to the challenges in learning discriminative re-id features, this thesis contributes to the re-id domain by proposing three related methodologies and one new re-id setting:
(1) Gaussian mixture importance estimation. Handcrafted features are usually not discriminative enough for person re-id because of noisy information, such as background clutters. To precisely evaluate the similarities between person images, the main task of distance metric learning is to filter out the noisy information. Keep It Simple and Straightforward MEtric (KISSME) is an effective method in person re-id. However, it is sensitive to the feature dimensionality and cannot capture the multi-modes in dataset. To this end, a Gaussian Mixture Importance Estimation re-id approach is proposed, which exploits the Gaussian Mixture Models for estimating the observed commonalities of similar and dissimilar person pairs in the feature space.
(2) Unsupervised domain-adaptive person re-id based on pedestrian attributes. In person re-id, person identities are usually not overlapped among different domains (or datasets) and this raises the difficulties in generalizing re-id models. Different from person identity, pedestrian attributes, eg., hair length, clothes type and color, are consistent across different domains (or datasets). However, most of re-id datasets lack attribute annotations. On the other hand, in the field of pedestrian attribute recognition, there is a number of datasets labeled with attributes. Exploiting such data for re-id purpose can alleviate the shortage of attribute annotations in re-id domain and improve the generalization capability of re-id model. To this end, an unsupervised domain-adaptive re-id feature learning framework is proposed to make full use of attribute annotations. Specifically, an existing unsupervised domain adaptation method has been extended to transfer attribute-based features from attribute recognition domain to the re-id domain. With the proposed re-id feature learning framework, the domain invariant feature representations can be effectively extracted.
(3) Intra-camera supervised person re-id. Annotating the large-scale re-id datasets requires a tedious data collection and annotation process and therefore leads to poor scalability in practical person re-id applications. To overcome this fundamental limitation, a new person re-id setting is considered without inter-camera identity association but only with identity labels independently annotated within each camera-view. This eliminates the most time-consuming and tedious inter-camera identity association annotating process and thus significantly reduces the amount of human efforts required during annotation. It hence gives rise to a more scalable and more feasible learning scenario, which is named as Intra-Camera Supervised (ICS) person re-id. Under this ICS setting, a new re-id method, i.e., Multi-task Mulit-label (MATE) learning method, is formulated. Given no inter-camera association,
MATE is specially designed for self-discovering the inter-camera identity correspondence. This is achieved by inter-camera multi-label learning under a joint multi-task inference framework. In addition, MATE can also efficiently learn the discriminative re-id feature representations using the available identity labels within each camera-view
- …