85,307 research outputs found
PRISM: Person Re-Identification via Structured Matching
Person re-identification (re-id), an emerging problem in visual surveillance,
deals with maintaining entities of individuals whilst they traverse various
locations surveilled by a camera network. From a visual perspective re-id is
challenging due to significant changes in visual appearance of individuals in
cameras with different pose, illumination and calibration. Globally the
challenge arises from the need to maintain structurally consistent matches
among all the individual entities across different camera views. We propose
PRISM, a structured matching method to jointly account for these challenges. We
view the global problem as a weighted graph matching problem and estimate edge
weights by learning to predict them based on the co-occurrences of visual
patterns in the training examples. These co-occurrence based scores in turn
account for appearance changes by inferring likely and unlikely visual
co-occurrences appearing in training instances. We implement PRISM on single
shot and multi-shot scenarios. PRISM uniformly outperforms state-of-the-art in
terms of matching rate while being computationally efficient
PaMM: Pose-aware Multi-shot Matching for Improving Person Re-identification
Person re-identification is the problem of recognizing people across
different images or videos with non-overlapping views. Although there has been
much progress in person re-identification over the last decade, it remains a
challenging task because appearances of people can seem extremely different
across diverse camera viewpoints and person poses. In this paper, we propose a
novel framework for person re-identification by analyzing camera viewpoints and
person poses in a so-called Pose-aware Multi-shot Matching (PaMM), which
robustly estimates people's poses and efficiently conducts multi-shot matching
based on pose information. Experimental results using public person
re-identification datasets show that the proposed methods outperform
state-of-the-art methods and are promising for person re-identification from
diverse viewpoints and pose variances.Comment: 12 pages, 12 figures, 4 table
Orientation Driven Bag of Appearances for Person Re-identification
Person re-identification (re-id) consists of associating individual across
camera network, which is valuable for intelligent video surveillance and has
drawn wide attention. Although person re-identification research is making
progress, it still faces some challenges such as varying poses, illumination
and viewpoints. For feature representation in re-identification, existing works
usually use low-level descriptors which do not take full advantage of body
structure information, resulting in low representation ability.
%discrimination. To solve this problem, this paper proposes the mid-level
body-structure based feature representation (BSFR) which introduces body
structure pyramid for codebook learning and feature pooling in the vertical
direction of human body. Besides, varying viewpoints in the horizontal
direction of human body usually causes the data missing problem, , the
appearances obtained in different orientations of the identical person could
vary significantly. To address this problem, the orientation driven bag of
appearances (ODBoA) is proposed to utilize person orientation information
extracted by orientation estimation technic. To properly evaluate the proposed
approach, we introduce a new re-identification dataset (Market-1203) based on
the Market-1501 dataset and propose a new re-identification dataset (PKU-Reid).
Both datasets contain multiple images captured in different body orientations
for each person. Experimental results on three public datasets and two proposed
datasets demonstrate the superiority of the proposed approach, indicating the
effectiveness of body structure and orientation information for improving
re-identification performance.Comment: 13 pages, 15 figures, 3 tables, submitted to IEEE Transactions on
Circuits and Systems for Video Technolog
A Systematic Evaluation and Benchmark for Person Re-Identification: Features, Metrics, and Datasets
Person re-identification (re-id) is a critical problem in video analytics
applications such as security and surveillance. The public release of several
datasets and code for vision algorithms has facilitated rapid progress in this
area over the last few years. However, directly comparing re-id algorithms
reported in the literature has become difficult since a wide variety of
features, experimental protocols, and evaluation metrics are employed. In order
to address this need, we present an extensive review and performance evaluation
of single- and multi-shot re-id algorithms. The experimental protocol
incorporates the most recent advances in both feature extraction and metric
learning. To ensure a fair comparison, all of the approaches were implemented
using a unified code library that includes 11 feature extraction algorithms and
22 metric learning and ranking techniques. All approaches were evaluated using
a new large-scale dataset that closely mimics a real-world problem setting, in
addition to 16 other publicly available datasets: VIPeR, GRID, CAVIAR,
DukeMTMC4ReID, 3DPeS, PRID, V47, WARD, SAIVT-SoftBio, CUHK01, CHUK02, CUHK03,
RAiD, iLIDSVID, HDA+ and Market1501. The evaluation codebase and results will
be made publicly available for community use.Comment: Preliminary work on person Re-Id benchmark. S. Karanam and M. Gou
contributed equally. 14 pages, 6 figures, 4 tables. For supplementary
material, see
http://robustsystems.coe.neu.edu/sites/robustsystems.coe.neu.edu/files/systems/supmat/ReID_benchmark_supp.zi
Enhancing Person Re-identification in a Self-trained Subspace
Despite the promising progress made in recent years, person re-identification
(re-ID) remains a challenging task due to the complex variations in human
appearances from different camera views. For this challenging problem, a large
variety of algorithms have been developed in the fully-supervised setting,
requiring access to a large amount of labeled training data. However, the main
bottleneck for fully-supervised re-ID is the limited availability of labeled
training samples. To address this problem, in this paper, we propose a
self-trained subspace learning paradigm for person re-ID which effectively
utilizes both labeled and unlabeled data to learn a discriminative subspace
where person images across disjoint camera views can be easily matched. The
proposed approach first constructs pseudo pairwise relationships among
unlabeled persons using the k-nearest neighbors algorithm. Then, with the
pseudo pairwise relationships, the unlabeled samples can be easily combined
with the labeled samples to learn a discriminative projection by solving an
eigenvalue problem. In addition, we refine the pseudo pairwise relationships
iteratively, which further improves the learning performance. A multi-kernel
embedding strategy is also incorporated into the proposed approach to cope with
the non-linearity in person's appearance and explore the complementation of
multiple kernels. In this way, the performance of person re-ID can be greatly
enhanced when training data are insufficient. Experimental results on six
widely-used datasets demonstrate the effectiveness of our approach and its
performance can be comparable to the reported results of most state-of-the-art
fully-supervised methods while using much fewer labeled data.Comment: Accepted by ACM Transactions on Multimedia Computing, Communications,
and Applications (TOMM
Recognizing Partial Biometric Patterns
Biometric recognition on partial captured targets is challenging, where only
several partial observations of objects are available for matching. In this
area, deep learning based methods are widely applied to match these partial
captured objects caused by occlusions, variations of postures or just partial
out of view in person re-identification and partial face recognition. However,
most current methods are not able to identify an individual in case that some
parts of the object are not obtainable, while the rest are specialized to
certain constrained scenarios. To this end, we propose a robust general
framework for arbitrary biometric matching scenarios without the limitations of
alignment as well as the size of inputs. We introduce a feature post-processing
step to handle the feature maps from FCN and a dictionary learning based
Spatial Feature Reconstruction (SFR) to match different sized feature maps in
this work. Moreover, the batch hard triplet loss function is applied to
optimize the model. The applicability and effectiveness of the proposed method
are demonstrated by the results from experiments on three person
re-identification datasets (Market1501, CUHK03, DukeMTMC-reID), two partial
person datasets (Partial REID and Partial iLIDS) and two partial face datasets
(CASIA-NIR-Distance and Partial LFW), on which state-of-the-art performance is
ensured in comparison with several state-of-the-art approaches. The code is
released online and can be found on the website:
https://github.com/lingxiao-he/Partial-Person-ReID.Comment: 13 pages, 11 figure
Robust Depth-based Person Re-identification
Person re-identification (re-id) aims to match people across non-overlapping
camera views. So far the RGB-based appearance is widely used in most existing
works. However, when people appeared in extreme illumination or changed
clothes, the RGB appearance-based re-id methods tended to fail. To overcome
this problem, we propose to exploit depth information to provide more invariant
body shape and skeleton information regardless of illumination and color
change. More specifically, we exploit depth voxel covariance descriptor and
further propose a locally rotation invariant depth shape descriptor called
Eigen-depth feature to describe pedestrian body shape. We prove that the
distance between any two covariance matrices on the Riemannian manifold is
equivalent to the Euclidean distance between the corresponding Eigen-depth
features. Furthermore, we propose a kernelized implicit feature transfer scheme
to estimate Eigen-depth feature implicitly from RGB image when depth
information is not available. We find that combining the estimated depth
features with RGB-based appearance features can sometimes help to better reduce
visual ambiguities of appearance features caused by illumination and similar
clothes. The effectiveness of our models was validated on publicly available
depth pedestrian datasets as compared to related methods for person
re-identification.Comment: IEEE Transactions on Image Processing Early Acces
Adversarial Open-World Person Re-Identification
In a typical real-world application of re-id, a watch-list (gallery set) of a
handful of target people (e.g. suspects) to track around a large volume of
non-target people are demanded across camera views, and this is called the
open-world person re-id. Different from conventional (closed-world) person
re-id, a large portion of probe samples are not from target people in the
open-world setting. And, it always happens that a non-target person would look
similar to a target one and therefore would seriously challenge a re-id system.
In this work, we introduce a deep open-world group-based person re-id model
based on adversarial learning to alleviate the attack problem caused by similar
non-target people. The main idea is learning to attack feature extractor on the
target people by using GAN to generate very target-like images (imposters), and
in the meantime the model will make the feature extractor learn to tolerate the
attack by discriminative learning so as to realize group-based verification.
The framework we proposed is called the adversarial open-world person
re-identification, and this is realized by our Adversarial PersonNet (APN) that
jointly learns a generator, a person discriminator, a target discriminator and
a feature extractor, where the feature extractor and target discriminator share
the same weights so as to makes the feature extractor learn to tolerate the
attack by imposters for better group-based verification. While open-world
person re-id is challenging, we show for the first time that the
adversarial-based approach helps stabilize person re-id system under imposter
attack more effectively.Comment: 17 pages, 3 figures, Accepted by European Conference on Computer
Vision 201
Joint Person Re-identification and Camera Network Topology Inference in Multiple Cameras
Person re-identification is the task of recognizing or identifying a person
across multiple views in multi-camera networks. Although there has been much
progress in person re-identification, person re-identification in large-scale
multi-camera networks still remains a challenging task because of the large
spatio-temporal uncertainty and high complexity due to a large number of
cameras and people. To handle these difficulties, additional information such
as camera network topology should be provided, which is also difficult to
automatically estimate, unfortunately. In this study, we propose a unified
framework which jointly solves both person re-identification and camera network
topology inference problems with minimal prior knowledge about the
environments. The proposed framework takes general multi-camera network
environments into account and can be applied to online person re-identification
in large-scale multi-camera networks. In addition, to effectively show the
superiority of the proposed framework, we provide a new person
re-identification dataset with full annotations, named SLP, captured in the
multi-camera network consisting of nine non-overlapping cameras. Experimental
results using our person re-identification and public datasets show that the
proposed methods are promising for both person re-identification and camera
topology inference tasks.Comment: 14 pages, 14 figures, 6 table
Tracking Persons-of-Interest via Unsupervised Representation Adaptation
Multi-face tracking in unconstrained videos is a challenging problem as faces
of one person often appear drastically different in multiple shots due to
significant variations in scale, pose, expression, illumination, and make-up.
Existing multi-target tracking methods often use low-level features which are
not sufficiently discriminative for identifying faces with such large
appearance variations. In this paper, we tackle this problem by learning
discriminative, video-specific face representations using convolutional neural
networks (CNNs). Unlike existing CNN-based approaches which are only trained on
large-scale face image datasets offline, we use the contextual constraints to
generate a large number of training samples for a given video, and further
adapt the pre-trained face CNN to specific videos using discovered training
samples. Using these training samples, we optimize the embedding space so that
the Euclidean distances correspond to a measure of semantic face similarity via
minimizing a triplet loss function. With the learned discriminative features,
we apply the hierarchical clustering algorithm to link tracklets across
multiple shots to generate trajectories. We extensively evaluate the proposed
algorithm on two sets of TV sitcoms and YouTube music videos, analyze the
contribution of each component, and demonstrate significant performance
improvement over existing techniques.Comment: Project page: http://vllab1.ucmerced.edu/~szhang/FaceTracking
- …