43,549 research outputs found
An Introduction to Person Re-identification with Generative Adversarial Networks
Person re-identification is a basic subject in the field of computer vision.
The traditional methods have several limitations in solving the problems of
person illumination like occlusion, pose variation and feature variation under
complex background. Fortunately, deep learning paradigm opens new ways of the
person re-identification research and becomes a hot spot in this field.
Generative Adversarial Nets (GANs) in the past few years attracted lots of
attention in solving these problems. This paper reviews the GAN based methods
for person re-identification focuses on the related papers about different GAN
based frameworks and discusses their advantages and disadvantages. Finally, it
proposes the direction of future research, especially the prospect of person
re-identification methods based on GANs
FD-GAN: Pose-guided Feature Distilling GAN for Robust Person Re-identification
Person re-identification (reID) is an important task that requires to
retrieve a person's images from an image dataset, given one image of the person
of interest. For learning robust person features, the pose variation of person
images is one of the key challenges. Existing works targeting the problem
either perform human alignment, or learn human-region-based representations.
Extra pose information and computational cost is generally required for
inference. To solve this issue, a Feature Distilling Generative Adversarial
Network (FD-GAN) is proposed for learning identity-related and pose-unrelated
representations. It is a novel framework based on a Siamese structure with
multiple novel discriminators on human poses and identities. In addition to the
discriminators, a novel same-pose loss is also integrated, which requires
appearance of a same person's generated images to be similar. After learning
pose-unrelated person features with pose guidance, no auxiliary pose
information and additional computational cost is required during testing. Our
proposed FD-GAN achieves state-of-the-art performance on three person reID
datasets, which demonstrates that the effectiveness and robust feature
distilling capability of the proposed FD-GAN.Comment: Accepted in Proceedings of 32nd Conference on Neural Information
Processing Systems (NeurIPS 2018). Code available:
https://github.com/yxgeee/FD-GA
Improving Deep Visual Representation for Person Re-identification by Global and Local Image-language Association
Person re-identification is an important task that requires learning
discriminative visual features for distinguishing different person identities.
Diverse auxiliary information has been utilized to improve the visual feature
learning. In this paper, we propose to exploit natural language description as
additional training supervisions for effective visual features. Compared with
other auxiliary information, language can describe a specific person from more
compact and semantic visual aspects, thus is complementary to the pixel-level
image data. Our method not only learns better global visual feature with the
supervision of the overall description but also enforces semantic consistencies
between local visual and linguistic features, which is achieved by building
global and local image-language associations. The global image-language
association is established according to the identity labels, while the local
association is based upon the implicit correspondences between image regions
and noun phrases. Extensive experiments demonstrate the effectiveness of
employing language as training supervisions with the two association schemes.
Our method achieves state-of-the-art performance without utilizing any
auxiliary information during testing and shows better performance than other
joint embedding methods for the image-language association.Comment: ECC
Attribute-Aware Attention Model for Fine-grained Representation Learning
How to learn a discriminative fine-grained representation is a key point in
many computer vision applications, such as person re-identification,
fine-grained classification, fine-grained image retrieval, etc. Most of the
previous methods focus on learning metrics or ensemble to derive better global
representation, which are usually lack of local information. Based on the
considerations above, we propose a novel Attribute-Aware Attention Model
(), which can learn local attribute representation and global category
representation simultaneously in an end-to-end manner. The proposed model
contains two attention models: attribute-guided attention module uses attribute
information to help select category features in different regions, at the same
time, category-guided attention module selects local features of different
attributes with the help of category cues. Through this attribute-category
reciprocal process, local and global features benefit from each other. Finally,
the resulting feature contains more intrinsic information for image recognition
instead of the noisy and irrelevant features. Extensive experiments conducted
on Market-1501, CompCars, CUB-200-2011 and CARS196 demonstrate the
effectiveness of our . Code is available at
https://github.com/iamhankai/attribute-aware-attention.Comment: Accepted by ACM Multimedia 2018 (Oral). Code is available at
https://github.com/iamhankai/attribute-aware-attentio
Adaptation and Re-Identification Network: An Unsupervised Deep Transfer Learning Approach to Person Re-Identification
Person re-identification (Re-ID) aims at recognizing the same person from
images taken across different cameras. To address this task, one typically
requires a large amount labeled data for training an effective Re-ID model,
which might not be practical for real-world applications. To alleviate this
limitation, we choose to exploit a sufficient amount of pre-existing labeled
data from a different (auxiliary) dataset. By jointly considering such an
auxiliary dataset and the dataset of interest (but without label information),
our proposed adaptation and re-identification network (ARN) performs
unsupervised domain adaptation, which leverages information across datasets and
derives domain-invariant features for Re-ID purposes. In our experiments, we
verify that our network performs favorably against state-of-the-art
unsupervised Re-ID approaches, and even outperforms a number of baseline Re-ID
methods which require fully supervised data for training.Comment: 7 pages, 3 figures. CVPR 2018 workshop pape
Person Re-Identification by Camera Correlation Aware Feature Augmentation
The challenge of person re-identification (re-id) is to match individual
images of the same person captured by different non-overlapping camera views
against significant and unknown cross-view feature distortion. While a large
number of distance metric/subspace learning models have been developed for
re-id, the cross-view transformations they learned are view-generic and thus
potentially less effective in quantifying the feature distortion inherent to
each camera view. Learning view-specific feature transformations for re-id
(i.e., view-specific re-id), an under-studied approach, becomes an alternative
resort for this problem. In this work, we formulate a novel view-specific
person re-identification framework from the feature augmentation point of view,
called Camera coRrelation Aware Feature augmenTation (CRAFT). Specifically,
CRAFT performs cross-view adaptation by automatically measuring camera
correlation from cross-view visual data distribution and adaptively conducting
feature augmentation to transform the original features into a new adaptive
space. Through our augmentation framework, view-generic learning algorithms can
be readily generalized to learn and optimize view-specific sub-models whilst
simultaneously modelling view-generic discrimination information. Therefore,
our framework not only inherits the strength of view-generic model learning but
also provides an effective way to take into account view specific
characteristics. Our CRAFT framework can be extended to jointly learn
view-specific feature transformations for person re-id across a large network
with more than two cameras, a largely under-investigated but realistic re-id
setting. Additionally, we present a domain-generic deep person appearance
representation which is designed particularly to be towards view invariant for
facilitating cross-view adaptation by CRAFT.Comment: To Appear in IEEE Transactions on Pattern Analysis and Machine
Intelligence, 201
Spatial and Temporal Mutual Promotion for Video-based Person Re-identification
Video-based person re-identification is a crucial task of matching video
sequences of a person across multiple camera views. Generally, features
directly extracted from a single frame suffer from occlusion, blur,
illumination and posture changes. This leads to false activation or missing
activation in some regions, which corrupts the appearance and motion
representation. How to explore the abundant spatial-temporal information in
video sequences is the key to solve this problem. To this end, we propose a
Refining Recurrent Unit (RRU) that recovers the missing parts and suppresses
noisy parts of the current frame's features by referring historical frames.
With RRU, the quality of each frame's appearance representation is improved.
Then we use the Spatial-Temporal clues Integration Module (STIM) to mine the
spatial-temporal information from those upgraded features. Meanwhile, the
multi-level training objective is used to enhance the capability of RRU and
STIM. Through the cooperation of those modules, the spatial and temporal
features mutually promote each other and the final spatial-temporal feature
representation is more discriminative and robust. Extensive experiments are
conducted on three challenging datasets, i.e., iLIDS-VID, PRID-2011 and MARS.
The experimental results demonstrate that our approach outperforms existing
state-of-the-art methods of video-based person re-identification on iLIDS-VID
and MARS and achieves favorable results on PRID-2011.Comment: Accepted by AAAI19 as spotligh
Adversarial Open-World Person Re-Identification
In a typical real-world application of re-id, a watch-list (gallery set) of a
handful of target people (e.g. suspects) to track around a large volume of
non-target people are demanded across camera views, and this is called the
open-world person re-id. Different from conventional (closed-world) person
re-id, a large portion of probe samples are not from target people in the
open-world setting. And, it always happens that a non-target person would look
similar to a target one and therefore would seriously challenge a re-id system.
In this work, we introduce a deep open-world group-based person re-id model
based on adversarial learning to alleviate the attack problem caused by similar
non-target people. The main idea is learning to attack feature extractor on the
target people by using GAN to generate very target-like images (imposters), and
in the meantime the model will make the feature extractor learn to tolerate the
attack by discriminative learning so as to realize group-based verification.
The framework we proposed is called the adversarial open-world person
re-identification, and this is realized by our Adversarial PersonNet (APN) that
jointly learns a generator, a person discriminator, a target discriminator and
a feature extractor, where the feature extractor and target discriminator share
the same weights so as to makes the feature extractor learn to tolerate the
attack by imposters for better group-based verification. While open-world
person re-id is challenging, we show for the first time that the
adversarial-based approach helps stabilize person re-id system under imposter
attack more effectively.Comment: 17 pages, 3 figures, Accepted by European Conference on Computer
Vision 201
A Survey of Deep Facial Attribute Analysis
Facial attribute analysis has received considerable attention when deep
learning techniques made remarkable breakthroughs in this field over the past
few years. Deep learning based facial attribute analysis consists of two basic
sub-issues: facial attribute estimation (FAE), which recognizes whether facial
attributes are present in given images, and facial attribute manipulation
(FAM), which synthesizes or removes desired facial attributes. In this paper,
we provide a comprehensive survey of deep facial attribute analysis from the
perspectives of both estimation and manipulation. First, we summarize a general
pipeline that deep facial attribute analysis follows, which comprises two
stages: data preprocessing and model construction. Additionally, we introduce
the underlying theories of this two-stage pipeline for both FAE and FAM.
Second, the datasets and performance metrics commonly used in facial attribute
analysis are presented. Third, we create a taxonomy of state-of-the-art methods
and review deep FAE and FAM algorithms in detail. Furthermore, several
additional facial attribute related issues are introduced, as well as relevant
real-world applications. Finally, we discuss possible challenges and promising
future research directions.Comment: submitted to International Journal of Computer Vision (IJCV
Pyramidal Person Re-IDentification via Multi-Loss Dynamic Training
Most existing Re-IDentification (Re-ID) methods are highly dependent on
precise bounding boxes that enable images to be aligned with each other.
However, due to the challenging practical scenarios, current detection models
often produce inaccurate bounding boxes, which inevitably degenerate the
performance of existing Re-ID algorithms. In this paper, we propose a novel
coarse-to-fine pyramid model to relax the need of bounding boxes, which not
only incorporates local and global information, but also integrates the gradual
cues between them. The pyramid model is able to match at different scales and
then search for the correct image of the same identity, even when the image
pairs are not aligned. In addition, in order to learn discriminative identity
representation, we explore a dynamic training scheme to seamlessly unify two
losses and extract appropriate shared information between them. Experimental
results clearly demonstrate that the proposed method achieves the
state-of-the-art results on three datasets. Especially, our approach exceeds
the current best method by 9.5% on the most challenging CUHK03 dataset.Comment: Accepted by 2019 Conference on Computer Vision and Pattern
Recognitio
- …