62,461 research outputs found
Person Re-identification in Appearance Impaired Scenarios
Person re-identification is critical in surveillance applications. Current
approaches rely on appearance based features extracted from a single or
multiple shots of the target and candidate matches. These approaches are at a
disadvantage when trying to distinguish between candidates dressed in similar
colors or when targets change their clothing. In this paper we propose a
dynamics-based feature to overcome this limitation. The main idea is to capture
soft biometrics from gait and motion patterns by gathering dense short
trajectories (tracklets) which are Fisher vector encoded. To illustrate the
merits of the proposed features we introduce three new "appearance-impaired"
datasets. Our experiments on the original and the appearance impaired datasets
demonstrate the benefits of incorporating dynamics-based information with
appearance-based information to re-identification algorithms.Comment: 10 page
Person Re-Identification by Camera Correlation Aware Feature Augmentation
The challenge of person re-identification (re-id) is to match individual
images of the same person captured by different non-overlapping camera views
against significant and unknown cross-view feature distortion. While a large
number of distance metric/subspace learning models have been developed for
re-id, the cross-view transformations they learned are view-generic and thus
potentially less effective in quantifying the feature distortion inherent to
each camera view. Learning view-specific feature transformations for re-id
(i.e., view-specific re-id), an under-studied approach, becomes an alternative
resort for this problem. In this work, we formulate a novel view-specific
person re-identification framework from the feature augmentation point of view,
called Camera coRrelation Aware Feature augmenTation (CRAFT). Specifically,
CRAFT performs cross-view adaptation by automatically measuring camera
correlation from cross-view visual data distribution and adaptively conducting
feature augmentation to transform the original features into a new adaptive
space. Through our augmentation framework, view-generic learning algorithms can
be readily generalized to learn and optimize view-specific sub-models whilst
simultaneously modelling view-generic discrimination information. Therefore,
our framework not only inherits the strength of view-generic model learning but
also provides an effective way to take into account view specific
characteristics. Our CRAFT framework can be extended to jointly learn
view-specific feature transformations for person re-id across a large network
with more than two cameras, a largely under-investigated but realistic re-id
setting. Additionally, we present a domain-generic deep person appearance
representation which is designed particularly to be towards view invariant for
facilitating cross-view adaptation by CRAFT.Comment: To Appear in IEEE Transactions on Pattern Analysis and Machine
Intelligence, 201
Frustratingly Easy Person Re-Identification: Generalizing Person Re-ID in Practice
Contemporary person re-identification (\reid) methods usually require access
to data from the deployment camera network during training in order to perform
well. This is because contemporary \reid{} models trained on one dataset do not
generalise to other camera networks due to the domain-shift between datasets.
This requirement is often the bottleneck for deploying \reid{} systems in
practical security or commercial applications, as it may be impossible to
collect this data in advance or prohibitively costly to annotate it. This paper
alleviates this issue by proposing a simple baseline for domain
generalizable~(DG) person re-identification. That is, to learn a \reid{} model
from a set of source domains that is suitable for application to unseen
datasets out-of-the-box, without any model updating. Specifically, we observe
that the domain discrepancy in \reid{} is due to style and content variance
across datasets and demonstrate appropriate Instance and Feature Normalization
alleviates much of the resulting domain-shift in Deep \reid{} models. Instance
Normalization~(IN) in early layers filters out style statistic variations and
Feature Normalization~(FN) in deep layers is able to further eliminate
disparity in content statistics. Compared to contemporary alternatives, this
approach is extremely simple to implement, while being faster to train and
test, thus making it an extremely valuable baseline for implementing \reid{} in
practice. With a few lines of code, it increases the rank 1 \reid{} accuracy by
{11.8\%, 33.2\%, 12.8\% and 8.5\%} on the VIPeR, PRID, GRID, and i-LIDS
benchmarks respectively. Source codes are available at
\url{https://github.com/BJTUJia/person_reID_DualNorm}.Comment: 14 pages,2 figure
Weighted Bilinear Coding over Salient Body Parts for Person Re-identification
Deep convolutional neural networks (CNNs) have demonstrated dominant
performance in person re-identification (Re-ID). Existing CNN based methods
utilize global average pooling (GAP) to aggregate intermediate convolutional
features for Re-ID. However, this strategy only considers the first-order
statistics of local features and treats local features at different locations
equally important, leading to sub-optimal feature representation. To deal with
these issues, we propose a novel weighted bilinear coding (WBC) framework for
local feature aggregation in CNN networks to pursue more representative and
discriminative feature representations, which can adapt to other
state-of-the-art methods and improve their performance. In specific, bilinear
coding is used to encode the channel-wise feature correlations to capture
richer feature interactions. Meanwhile, a weighting scheme is applied on the
bilinear coding to adaptively adjust the weights of local features at different
locations based on their importance in recognition, further improving the
discriminability of feature aggregation. To handle the spatial misalignment
issue, we use a salient part net (spatial attention module) to derive salient
body parts, and apply the WBC model on each part. The final representation,
formed by concatenating the WBC encoded features of each part, is both
discriminative and resistant to spatial misalignment. Experiments on three
benchmarks including Market-1501, DukeMTMC-reID and CUHK03 evidence the
favorable performance of our method against other outstanding methods.Comment: 22 page
A Siamese Long Short-Term Memory Architecture for Human Re-Identification
Matching pedestrians across multiple camera views known as human
re-identification (re-identification) is a challenging problem in visual
surveillance. In the existing works concentrating on feature extraction,
representations are formed locally and independent of other regions. We present
a novel siamese Long Short-Term Memory (LSTM) architecture that can process
image regions sequentially and enhance the discriminative capability of local
feature representation by leveraging contextual information. The feedback
connections and internal gating mechanism of the LSTM cells enable our model to
memorize the spatial dependencies and selectively propagate relevant contextual
information through the network. We demonstrate improved performance compared
to the baseline algorithm with no LSTM units and promising results compared to
state-of-the-art methods on Market-1501, CUHK03 and VIPeR datasets.
Visualization of the internal mechanism of LSTM cells shows meaningful patterns
can be learned by our method
Towards Fine-grained Human Pose Transfer with Detail Replenishing Network
Human pose transfer (HPT) is an emerging research topic with huge potential
in fashion design, media production, online advertising and virtual reality.
For these applications, the visual realism of fine-grained appearance details
is crucial for production quality and user engagement. However, existing HPT
methods often suffer from three fundamental issues: detail deficiency, content
ambiguity and style inconsistency, which severely degrade the visual quality
and realism of generated images. Aiming towards real-world applications, we
develop a more challenging yet practical HPT setting, termed as Fine-grained
Human Pose Transfer (FHPT), with a higher focus on semantic fidelity and detail
replenishment. Concretely, we analyze the potential design flaws of existing
methods via an illustrative example, and establish the core FHPT methodology by
combing the idea of content synthesis and feature transfer together in a
mutually-guided fashion. Thereafter, we substantiate the proposed methodology
with a Detail Replenishing Network (DRN) and a corresponding coarse-to-fine
model training scheme. Moreover, we build up a complete suite of fine-grained
evaluation protocols to address the challenges of FHPT in a comprehensive
manner, including semantic analysis, structural detection and perceptual
quality assessment. Extensive experiments on the DeepFashion benchmark dataset
have verified the power of proposed benchmark against start-of-the-art works,
with 12\%-14\% gain on top-10 retrieval recall, 5\% higher joint localization
accuracy, and near 40\% gain on face identity preservation. Moreover, the
evaluation results offer further insights to the subject matter, which could
inspire many promising future works along this direction.Comment: IEEE TIP submissio
Style Normalization and Restitution for Generalizable Person Re-identification
Existing fully-supervised person re-identification (ReID) methods usually
suffer from poor generalization capability caused by domain gaps. The key to
solving this problem lies in filtering out identity-irrelevant interference and
learning domain-invariant person representations. In this paper, we aim to
design a generalizable person ReID framework which trains a model on source
domains yet is able to generalize/perform well on target domains. To achieve
this goal, we propose a simple yet effective Style Normalization and
Restitution (SNR) module. Specifically, we filter out style variations (e.g.,
illumination, color contrast) by Instance Normalization (IN). However, such a
process inevitably removes discriminative information. We propose to distill
identity-relevant feature from the removed information and restitute it to the
network to ensure high discrimination. For better disentanglement, we enforce a
dual causal loss constraint in SNR to encourage the separation of
identity-relevant features and identity-irrelevant features. Extensive
experiments demonstrate the strong generalization capability of our framework.
Our models empowered by the SNR modules significantly outperform the
state-of-the-art domain generalization approaches on multiple widely-used
person ReID benchmarks, and also show superiority on unsupervised domain
adaptation.Comment: Accepted by CVPR202
Gated Siamese Convolutional Neural Network Architecture for Human Re-Identification
Matching pedestrians across multiple camera views, known as human
re-identification, is a challenging research problem that has numerous
applications in visual surveillance. With the resurgence of Convolutional
Neural Networks (CNNs), several end-to-end deep Siamese CNN architectures have
been proposed for human re-identification with the objective of projecting the
images of similar pairs (i.e. same identity) to be closer to each other and
those of dissimilar pairs to be distant from each other. However, current
networks extract fixed representations for each image regardless of other
images which are paired with it and the comparison with other images is done
only at the final level. In this setting, the network is at risk of failing to
extract finer local patterns that may be essential to distinguish positive
pairs from hard negative pairs. In this paper, we propose a gating function to
selectively emphasize such fine common local patterns by comparing the
mid-level features across pairs of images. This produces flexible
representations for the same image according to the images they are paired
with. We conduct experiments on the CUHK03, Market-1501 and VIPeR datasets and
demonstrate improved performance compared to a baseline Siamese CNN
architecture.Comment: Accepted to ECCV201
VOC-ReID: Vehicle Re-identification based on Vehicle-Orientation-Camera
Vehicle re-identification is a challenging task due to high intra-class
variances and small inter-class variances. In this work, we focus on the
failure cases caused by similar background and shape. They pose serve bias on
similarity, making it easier to neglect fine-grained information. To reduce the
bias, we propose an approach named VOC-ReID, taking the triplet
vehicle-orientation-camera as a whole and reforming background/shape similarity
as camera/orientation re-identification. At first, we train models for vehicle,
orientation and camera re-identification respectively. Then we use orientation
and camera similarity as penalty to get final similarity. Besides, we propose a
high performance baseline boosted by bag of tricks and weakly supervised data
augmentation. Our algorithm achieves the second place in vehicle
re-identification at the NVIDIA AI City Challenge 2020.Comment: AICity2020 Challenge, CVPR 2020 workshop, code avaible at github(link
in abstract
Unsupervised Person Re-identification: Clustering and Fine-tuning
The superiority of deeply learned pedestrian representations has been
reported in very recent literature of person re-identification (re-ID). In this
paper, we consider the more pragmatic issue of learning a deep feature with no
or only a few labels. We propose a progressive unsupervised learning (PUL)
method to transfer pretrained deep representations to unseen domains. Our
method is easy to implement and can be viewed as an effective baseline for
unsupervised re-ID feature learning. Specifically, PUL iterates between 1)
pedestrian clustering and 2) fine-tuning of the convolutional neural network
(CNN) to improve the original model trained on the irrelevant labeled dataset.
Since the clustering results can be very noisy, we add a selection operation
between the clustering and fine-tuning. At the beginning when the model is
weak, CNN is fine-tuned on a small amount of reliable examples which locate
near to cluster centroids in the feature space. As the model becomes stronger
in subsequent iterations, more images are being adaptively selected as CNN
training samples. Progressively, pedestrian clustering and the CNN model are
improved simultaneously until algorithm convergence. This process is naturally
formulated as self-paced learning. We then point out promising directions that
may lead to further improvement. Extensive experiments on three large-scale
re-ID datasets demonstrate that PUL outputs discriminative features that
improve the re-ID accuracy.Comment: Add more results, parameter analysis and comparison
- …