11,009 research outputs found
Person Re-Identification by Camera Correlation Aware Feature Augmentation
The challenge of person re-identification (re-id) is to match individual
images of the same person captured by different non-overlapping camera views
against significant and unknown cross-view feature distortion. While a large
number of distance metric/subspace learning models have been developed for
re-id, the cross-view transformations they learned are view-generic and thus
potentially less effective in quantifying the feature distortion inherent to
each camera view. Learning view-specific feature transformations for re-id
(i.e., view-specific re-id), an under-studied approach, becomes an alternative
resort for this problem. In this work, we formulate a novel view-specific
person re-identification framework from the feature augmentation point of view,
called Camera coRrelation Aware Feature augmenTation (CRAFT). Specifically,
CRAFT performs cross-view adaptation by automatically measuring camera
correlation from cross-view visual data distribution and adaptively conducting
feature augmentation to transform the original features into a new adaptive
space. Through our augmentation framework, view-generic learning algorithms can
be readily generalized to learn and optimize view-specific sub-models whilst
simultaneously modelling view-generic discrimination information. Therefore,
our framework not only inherits the strength of view-generic model learning but
also provides an effective way to take into account view specific
characteristics. Our CRAFT framework can be extended to jointly learn
view-specific feature transformations for person re-id across a large network
with more than two cameras, a largely under-investigated but realistic re-id
setting. Additionally, we present a domain-generic deep person appearance
representation which is designed particularly to be towards view invariant for
facilitating cross-view adaptation by CRAFT.Comment: To Appear in IEEE Transactions on Pattern Analysis and Machine
Intelligence, 201
Pseudo-positive regularization for deep person re-identification
An intrinsic challenge of person re-identification (re-ID) is the annotation
difficulty. This typically means 1) few training samples per identity, and 2)
thus the lack of diversity among the training samples. Consequently, we face
high risk of over-fitting when training the convolutional neural network (CNN),
a state-of-the-art method in person re-ID. To reduce the risk of over-fitting,
this paper proposes a Pseudo Positive Regularization (PPR) method to enrich the
diversity of the training data. Specifically, unlabeled data from an
independent pedestrian database is retrieved using the target training data as
query. A small proportion of these retrieved samples are randomly selected as
the Pseudo Positive samples and added to the target training set for the
supervised CNN training. The addition of Pseudo Positive samples is therefore a
data augmentation method to reduce the risk of over-fitting during CNN
training. We implement our idea in the identification CNN models (i.e.,
CaffeNet, VGGNet-16 and ResNet-50). On CUHK03 and Market-1501 datasets,
experimental results demonstrate that the proposed method consistently improves
the baseline and yields competitive performance to the state-of-the-art person
re-ID methods.Comment: 12 pages, 6 figure
Sparse Label Smoothing Regularization for Person Re-Identification
Person re-identification (re-id) is a cross-camera retrieval task which
establishes a correspondence between images of a person from multiple cameras.
Deep Learning methods have been successfully applied to this problem and have
achieved impressive results. However, these methods require a large amount of
labeled training data. Currently labeled datasets in person re-id are limited
in their scale and manual acquisition of such large-scale datasets from
surveillance cameras is a tedious and labor-intensive task. In this paper, we
propose a framework that performs intelligent data augmentation and assigns
partial smoothing label to generated data. Our approach first exploits the
clustering property of existing person re-id datasets to create groups of
similar objects that model cross-view variations. Each group is then used to
generate realistic images through adversarial training. Our aim is to emphasize
feature similarity between generated samples and the original samples. Finally,
we assign a non-uniform label distribution to the generated samples and define
a regularized loss function for training. The proposed approach tackles two
problems (1) how to efficiently use the generated data and (2) how to address
the over-smoothness problem found in current regularization methods. Extensive
experiments on four larges cale datasets show that our regularization method
significantly improves the Re-ID accuracy compared to existing methods.Comment: 13 pages, 6 figure
Large Margin Learning in Set to Set Similarity Comparison for Person Re-identification
Person re-identification (Re-ID) aims at matching images of the same person
across disjoint camera views, which is a challenging problem in multimedia
analysis, multimedia editing and content-based media retrieval communities. The
major challenge lies in how to preserve similarity of the same person across
video footages with large appearance variations, while discriminating different
individuals. To address this problem, conventional methods usually consider the
pairwise similarity between persons by only measuring the point to point (P2P)
distance. In this paper, we propose to use deep learning technique to model a
novel set to set (S2S) distance, in which the underline objective focuses on
preserving the compactness of intra-class samples for each camera view, while
maximizing the margin between the intra-class set and inter-class set. The S2S
distance metric is consisted of three terms, namely the class-identity term,
the relative distance term and the regularization term. The class-identity term
keeps the intra-class samples within each camera view gathering together, the
relative distance term maximizes the distance between the intra-class class set
and inter-class set across different camera views, and the regularization term
smoothness the parameters of deep convolutional neural network (CNN). As a
result, the final learned deep model can effectively find out the matched
target to the probe object among various candidates in the video gallery by
learning discriminative and stable feature representations. Using the CUHK01,
CUHK03, PRID2011 and Market1501 benchmark datasets, we extensively conducted
comparative evaluations to demonstrate the advantages of our method over the
state-of-the-art approaches.Comment: Accepted by IEEE Transactions on Multimedi
Hierarchical Feature Embedding for Attribute Recognition
Attribute recognition is a crucial but challenging task due to viewpoint
changes, illumination variations and appearance diversities, etc. Most of
previous work only consider the attribute-level feature embedding, which might
perform poorly in complicated heterogeneous conditions. To address this
problem, we propose a hierarchical feature embedding (HFE) framework, which
learns a fine-grained feature embedding by combining attribute and ID
information. In HFE, we maintain the inter-class and intra-class feature
embedding simultaneously. Not only samples with the same attribute but also
samples with the same ID are gathered more closely, which could restrict the
feature embedding of visually hard samples with regard to attributes and
improve the robustness to variant conditions. We establish this hierarchical
structure by utilizing HFE loss consisted of attribute-level and ID-level
constraints. We also introduce an absolute boundary regularization and a
dynamic loss weight as supplementary components to help build up the feature
embedding. Experiments show that our method achieves the state-of-the-art
results on two pedestrian attribute datasets and a facial attribute dataset.Comment: CVPR 202
Transfer Metric Learning: Algorithms, Applications and Outlooks
Distance metric learning (DML) aims to find an appropriate way to reveal the
underlying data relationship. It is critical in many machine learning, pattern
recognition and data mining algorithms, and usually require large amount of
label information (such as class labels or pair/triplet constraints) to achieve
satisfactory performance. However, the label information may be insufficient in
real-world applications due to the high-labeling cost, and DML may fail in this
case. Transfer metric learning (TML) is able to mitigate this issue for DML in
the domain of interest (target domain) by leveraging knowledge/information from
other related domains (source domains). Although achieved a certain level of
development, TML has limited success in various aspects such as selective
transfer, theoretical understanding, handling complex data, big data and
extreme cases. In this survey, we present a systematic review of the TML
literature. In particular, we group TML into different categories according to
different settings and metric transfer strategies, such as direct metric
approximation, subspace approximation, distance approximation, and distribution
approximation. A summarization and insightful discussion of the various TML
approaches and their applications will be presented. Finally, we indicate some
challenges and provide possible future directions.Comment: 14 pages, 5 figure
Enhancing Person Re-identification in a Self-trained Subspace
Despite the promising progress made in recent years, person re-identification
(re-ID) remains a challenging task due to the complex variations in human
appearances from different camera views. For this challenging problem, a large
variety of algorithms have been developed in the fully-supervised setting,
requiring access to a large amount of labeled training data. However, the main
bottleneck for fully-supervised re-ID is the limited availability of labeled
training samples. To address this problem, in this paper, we propose a
self-trained subspace learning paradigm for person re-ID which effectively
utilizes both labeled and unlabeled data to learn a discriminative subspace
where person images across disjoint camera views can be easily matched. The
proposed approach first constructs pseudo pairwise relationships among
unlabeled persons using the k-nearest neighbors algorithm. Then, with the
pseudo pairwise relationships, the unlabeled samples can be easily combined
with the labeled samples to learn a discriminative projection by solving an
eigenvalue problem. In addition, we refine the pseudo pairwise relationships
iteratively, which further improves the learning performance. A multi-kernel
embedding strategy is also incorporated into the proposed approach to cope with
the non-linearity in person's appearance and explore the complementation of
multiple kernels. In this way, the performance of person re-ID can be greatly
enhanced when training data are insufficient. Experimental results on six
widely-used datasets demonstrate the effectiveness of our approach and its
performance can be comparable to the reported results of most state-of-the-art
fully-supervised methods while using much fewer labeled data.Comment: Accepted by ACM Transactions on Multimedia Computing, Communications,
and Applications (TOMM
Collaborative Representation for Classification, Sparse or Non-sparse?
Sparse representation based classification (SRC) has been proved to be a
simple, effective and robust solution to face recognition. As it gets popular,
doubts on the necessity of enforcing sparsity starts coming up, and primary
experimental results showed that simply changing the -norm based
regularization to the computationally much more efficient -norm based
non-sparse version would lead to a similar or even better performance. However,
that's not always the case. Given a new classification task, it's still unclear
which regularization strategy (i.e., making the coefficients sparse or
non-sparse) is a better choice without trying both for comparison. In this
paper, we present as far as we know the first study on solving this issue,
based on plenty of diverse classification experiments. We propose a scoring
function for pre-selecting the regularization strategy using only the dataset
size, the feature dimensionality and a discrimination score derived from a
given feature representation. Moreover, we show that when dictionary learning
is taking into account, non-sparse representation has a more significant
superiority to sparse representation. This work is expected to enrich our
understanding of sparse/non-sparse collaborative representation for
classification and motivate further research activities.Comment: 8 pages, 1 figur
Transfer Adaptation Learning: A Decade Survey
The world we see is ever-changing and it always changes with people, things,
and the environment. Domain is referred to as the state of the world at a
certain moment. A research problem is characterized as transfer adaptation
learning (TAL) when it needs knowledge correspondence between different
moments/domains. Conventional machine learning aims to find a model with the
minimum expected risk on test data by minimizing the regularized empirical risk
on the training data, which, however, supposes that the training and test data
share similar joint probability distribution. TAL aims to build models that can
perform tasks of target domain by learning knowledge from a semantic related
but distribution different source domain. It is an energetic research filed of
increasing influence and importance, which is presenting a blowout publication
trend. This paper surveys the advances of TAL methodologies in the past decade,
and the technical challenges and essential problems of TAL have been observed
and discussed with deep insights and new perspectives. Broader solutions of
transfer adaptation learning being created by researchers are identified, i.e.,
instance re-weighting adaptation, feature adaptation, classifier adaptation,
deep network adaptation and adversarial adaptation, which are beyond the early
semi-supervised and unsupervised split. The survey helps researchers rapidly
but comprehensively understand and identify the research foundation, research
status, theoretical limitations, future challenges and under-studied issues
(universality, interpretability, and credibility) to be broken in the field
toward universal representation and safe applications in open-world scenarios.Comment: 26 pages, 4 figure
Fast and Accurate Person Re-Identification with RMNet
In this paper we introduce a new neural network architecture designed to use
in embedded vision applications. It merges the best working practices of
network architectures like MobileNets and ResNets to our named RMNet
architecture. We also focus on key moments of building mobile architectures to
carry out in the limited computation budget. Additionally, to demonstrate the
effectiveness of our architecture we evaluate the RMNet backbone on Person
Re-identification task. The proposed approach is in top 3 of state of the art
solutions on Market-1501 challenge, however our method significantly
outperforms them by the inference speed
- …