8,490 research outputs found
Deep Attributes Driven Multi-Camera Person Re-identification
The visual appearance of a person is easily affected by many factors like
pose variations, viewpoint changes and camera parameter differences. This makes
person Re-Identification (ReID) among multiple cameras a very challenging task.
This work is motivated to learn mid-level human attributes which are robust to
such visual appearance variations. And we propose a semi-supervised attribute
learning framework which progressively boosts the accuracy of attributes only
using a limited number of labeled data. Specifically, this framework involves a
three-stage training. A deep Convolutional Neural Network (dCNN) is first
trained on an independent dataset labeled with attributes. Then it is
fine-tuned on another dataset only labeled with person IDs using our defined
triplet loss. Finally, the updated dCNN predicts attribute labels for the
target dataset, which is combined with the independent dataset for the final
round of fine-tuning. The predicted attributes, namely \emph{deep attributes}
exhibit superior generalization ability across different datasets. By directly
using the deep attributes with simple Cosine distance, we have obtained
surprisingly good accuracy on four person ReID datasets. Experiments also show
that a simple metric learning modular further boosts our method, making it
significantly outperform many recent works.Comment: Person Re-identification; 17 pages; 5 figures; In IEEE ECCV 201
Attribute-Aware Attention Model for Fine-grained Representation Learning
How to learn a discriminative fine-grained representation is a key point in
many computer vision applications, such as person re-identification,
fine-grained classification, fine-grained image retrieval, etc. Most of the
previous methods focus on learning metrics or ensemble to derive better global
representation, which are usually lack of local information. Based on the
considerations above, we propose a novel Attribute-Aware Attention Model
(), which can learn local attribute representation and global category
representation simultaneously in an end-to-end manner. The proposed model
contains two attention models: attribute-guided attention module uses attribute
information to help select category features in different regions, at the same
time, category-guided attention module selects local features of different
attributes with the help of category cues. Through this attribute-category
reciprocal process, local and global features benefit from each other. Finally,
the resulting feature contains more intrinsic information for image recognition
instead of the noisy and irrelevant features. Extensive experiments conducted
on Market-1501, CompCars, CUB-200-2011 and CARS196 demonstrate the
effectiveness of our . Code is available at
https://github.com/iamhankai/attribute-aware-attention.Comment: Accepted by ACM Multimedia 2018 (Oral). Code is available at
https://github.com/iamhankai/attribute-aware-attentio
Virtual CNN Branching: Efficient Feature Ensemble for Person Re-Identification
In this paper we introduce an ensemble method for convolutional neural
network (CNN), called "virtual branching," which can be implemented with nearly
no additional parameters and computation on top of standard CNNs. We propose
our method in the context of person re-identification (re-ID). Our CNN model
consists of shared bottom layers, followed by "virtual" branches, where neurons
from a block of regular convolutional and fully-connected layers are
partitioned into multiple sets. Each virtual branch is trained with different
data to specialize in different aspects, e.g., a specific body region or pose
orientation. In this way, robust ensemble representations are obtained against
human body misalignment, deformations, or variations in viewing angles, at
nearly no any additional cost. The proposed method achieves competitive
performance on multiple person re-ID benchmark datasets, including Market-1501,
CUHK03, and DukeMTMC-reID
Cross-domain attribute representation based on convolutional neural network
In the problem of domain transfer learning, we learn a model for the
predic-tion in a target domain from the data of both some source domains and
the target domain, where the target domain is in lack of labels while the
source domain has sufficient labels. Besides the instances of the data,
recently the attributes of data shared across domains are also explored and
proven to be very helpful to leverage the information of different domains. In
this paper, we propose a novel learning framework for domain-transfer learning
based on both instances and attributes. We proposed to embed the attributes of
dif-ferent domains by a shared convolutional neural network (CNN), learn a
domain-independent CNN model to represent the information shared by dif-ferent
domains by matching across domains, and a domain-specific CNN model to
represent the information of each domain. The concatenation of the three CNN
model outputs is used to predict the class label. An iterative algo-rithm based
on gradient descent method is developed to learn the parameters of the model.
The experiments over benchmark datasets show the advantage of the proposed
model.Comment: arXiv admin note: substantial text overlap with arXiv:1803.0973
Factorized Distillation: Training Holistic Person Re-identification Model by Distilling an Ensemble of Partial ReID Models
Person re-identification (ReID) is aimed at identifying the same person
across videos captured from different cameras. In the view that networks
extracting global features using ordinary network architectures are difficult
to extract local features due to their weak attention mechanisms, researchers
have proposed a lot of elaborately designed ReID networks, while greatly
improving the accuracy, the model size and the feature extraction latency are
also soaring. We argue that a relatively compact ordinary network extracting
globally pooled features has the capability to extract discriminative local
features and can achieve state-of-the-art precision if only the model's
parameters are properly learnt. In order to reduce the difficulty in learning
hard identity labels, we propose a novel knowledge distillation method:
Factorized Distillation, which factorizes both feature maps and retrieval
features of holistic ReID network to mimic representations of multiple partial
ReID models, thus transferring the knowledge from partial ReID models to the
holistic network. Experiments show that the performance of model trained with
the proposed method can outperform state-of-the-art with relatively few network
parameters.Comment: 10 pages, 5 figure
A Pose-Sensitive Embedding for Person Re-Identification with Expanded Cross Neighborhood Re-Ranking
Person re identification is a challenging retrieval task that requires
matching a person's acquired image across non overlapping camera views. In this
paper we propose an effective approach that incorporates both the fine and
coarse pose information of the person to learn a discriminative embedding. In
contrast to the recent direction of explicitly modeling body parts or
correcting for misalignment based on these, we show that a rather
straightforward inclusion of acquired camera view and/or the detected joint
locations into a convolutional neural network helps to learn a very effective
representation. To increase retrieval performance, re-ranking techniques based
on computed distances have recently gained much attention. We propose a new
unsupervised and automatic re-ranking framework that achieves state-of-the-art
re-ranking performance. We show that in contrast to the current
state-of-the-art re-ranking methods our approach does not require to compute
new rank lists for each image pair (e.g., based on reciprocal neighbors) and
performs well by using simple direct rank list based comparison or even by just
using the already computed euclidean distances between the images. We show that
both our learned representation and our re-ranking method achieve
state-of-the-art performance on a number of challenging surveillance image and
video datasets.
The code is available online at:
https://github.com/pse-ecn/pose-sensitive-embeddingComment: CVPR 2018: v2 (fixes, added new results on PRW dataset
Enhancing Person Re-identification in a Self-trained Subspace
Despite the promising progress made in recent years, person re-identification
(re-ID) remains a challenging task due to the complex variations in human
appearances from different camera views. For this challenging problem, a large
variety of algorithms have been developed in the fully-supervised setting,
requiring access to a large amount of labeled training data. However, the main
bottleneck for fully-supervised re-ID is the limited availability of labeled
training samples. To address this problem, in this paper, we propose a
self-trained subspace learning paradigm for person re-ID which effectively
utilizes both labeled and unlabeled data to learn a discriminative subspace
where person images across disjoint camera views can be easily matched. The
proposed approach first constructs pseudo pairwise relationships among
unlabeled persons using the k-nearest neighbors algorithm. Then, with the
pseudo pairwise relationships, the unlabeled samples can be easily combined
with the labeled samples to learn a discriminative projection by solving an
eigenvalue problem. In addition, we refine the pseudo pairwise relationships
iteratively, which further improves the learning performance. A multi-kernel
embedding strategy is also incorporated into the proposed approach to cope with
the non-linearity in person's appearance and explore the complementation of
multiple kernels. In this way, the performance of person re-ID can be greatly
enhanced when training data are insufficient. Experimental results on six
widely-used datasets demonstrate the effectiveness of our approach and its
performance can be comparable to the reported results of most state-of-the-art
fully-supervised methods while using much fewer labeled data.Comment: Accepted by ACM Transactions on Multimedia Computing, Communications,
and Applications (TOMM
Cross Domain Knowledge Learning with Dual-branch Adversarial Network for Vehicle Re-identification
The widespread popularization of vehicles has facilitated all people's life
during the last decades. However, the emergence of a large number of vehicles
poses the critical but challenging problem of vehicle re-identification (reID).
Till now, for most vehicle reID algorithms, both the training and testing
processes are conducted on the same annotated datasets under supervision.
However, even a well-trained model will still cause fateful performance drop
due to the severe domain bias between the trained dataset and the real-world
scenes.
To address this problem, this paper proposes a domain adaptation framework
for vehicle reID (DAVR), which narrows the cross-domain bias by fully
exploiting the labeled data from the source domain to adapt the target domain.
DAVR develops an image-to-image translation network named Dual-branch
Adversarial Network (DAN), which could promote the images from the source
domain (well-labeled) to learn the style of target domain (unlabeled) without
any annotation and preserve identity information from source domain. Then the
generated images are employed to train the vehicle reID model by a proposed
attention-based feature learning model with more reasonable styles. Through the
proposed framework, the well-trained reID model has better domain adaptation
ability for various scenes in real-world situations. Comprehensive experimental
results have demonstrated that our proposed DAVR can achieve excellent
performances on both VehicleID dataset and VeRi-776 dataset.Comment: arXiv admin note: substantial text overlap with arXiv:1903.0786
Gated Siamese Convolutional Neural Network Architecture for Human Re-Identification
Matching pedestrians across multiple camera views, known as human
re-identification, is a challenging research problem that has numerous
applications in visual surveillance. With the resurgence of Convolutional
Neural Networks (CNNs), several end-to-end deep Siamese CNN architectures have
been proposed for human re-identification with the objective of projecting the
images of similar pairs (i.e. same identity) to be closer to each other and
those of dissimilar pairs to be distant from each other. However, current
networks extract fixed representations for each image regardless of other
images which are paired with it and the comparison with other images is done
only at the final level. In this setting, the network is at risk of failing to
extract finer local patterns that may be essential to distinguish positive
pairs from hard negative pairs. In this paper, we propose a gating function to
selectively emphasize such fine common local patterns by comparing the
mid-level features across pairs of images. This produces flexible
representations for the same image according to the images they are paired
with. We conduct experiments on the CUHK03, Market-1501 and VIPeR datasets and
demonstrate improved performance compared to a baseline Siamese CNN
architecture.Comment: Accepted to ECCV201
Person Re-identification Using Visual Attention
Despite recent attempts for solving the person re-identification problem, it
remains a challenging task since a person's appearance can vary significantly
when large variations in view angle, human pose, and illumination are involved.
In this paper, we propose a novel approach based on using a gradient-based
attention mechanism in deep convolution neural network for solving the person
re-identification problem. Our model learns to focus selectively on parts of
the input image for which the networks' output is most sensitive to and
processes them with high resolution while perceiving the surrounding image in
low resolution. Extensive comparative evaluations demonstrate that the proposed
method outperforms state-of-the-art approaches on the challenging CUHK01,
CUHK03, and Market 1501 datasets.Comment: Published at IEEE International Conference on Image Processing 201
- …