9,326 research outputs found
Hierarchical and Efficient Learning for Person Re-Identification
Recent works in the person re-identification task mainly focus on the model
accuracy while ignore factors related to the efficiency, e.g. model size and
latency, which are critical for practical application. In this paper, we
propose a novel Hierarchical and Efficient Network (HENet) that learns
hierarchical global, partial, and recovery features ensemble under the
supervision of multiple loss combinations. To further improve the robustness
against the irregular occlusion, we propose a new dataset augmentation
approach, dubbed Random Polygon Erasing (RPE), to random erase irregular area
of the input image for imitating the body part missing. We also propose an
Efficiency Score (ES) metric to evaluate the model efficiency. Extensive
experiments on Market1501, DukeMTMC-ReID, and CUHK03 datasets shows the
efficiency and superiority of our approach compared with epoch-making methods
An Introduction to Person Re-identification with Generative Adversarial Networks
Person re-identification is a basic subject in the field of computer vision.
The traditional methods have several limitations in solving the problems of
person illumination like occlusion, pose variation and feature variation under
complex background. Fortunately, deep learning paradigm opens new ways of the
person re-identification research and becomes a hot spot in this field.
Generative Adversarial Nets (GANs) in the past few years attracted lots of
attention in solving these problems. This paper reviews the GAN based methods
for person re-identification focuses on the related papers about different GAN
based frameworks and discusses their advantages and disadvantages. Finally, it
proposes the direction of future research, especially the prospect of person
re-identification methods based on GANs
Who did What at Where and When: Simultaneous Multi-Person Tracking and Activity Recognition
We present a bootstrapping framework to simultaneously improve multi-person
tracking and activity recognition at individual, interaction and social group
activity levels. The inference consists of identifying trajectories of all
pedestrian actors, individual activities, pairwise interactions, and collective
activities, given the observed pedestrian detections. Our method uses a
graphical model to represent and solve the joint tracking and recognition
problems via multi-stages: (1) activity-aware tracking, (2) joint interaction
recognition and occlusion recovery, and (3) collective activity recognition. We
solve the where and when problem with visual tracking, as well as the who and
what problem with recognition. High-order correlations among the visible and
occluded individuals, pairwise interactions, groups, and activities are then
solved using a hypergraph formulation within the Bayesian framework.
Experiments on several benchmarks show the advantages of our approach over
state-of-art methods
Robust and Low-Rank Representation for Fast Face Identification with Occlusions
In this paper we propose an iterative method to address the face
identification problem with block occlusions. Our approach utilizes a robust
representation based on two characteristics in order to model contiguous errors
(e.g., block occlusion) effectively. The first fits to the errors a
distribution described by a tailored loss function. The second describes the
error image as having a specific structure (resulting in low-rank in comparison
to image size). We will show that this joint characterization is effective for
describing errors with spatial continuity. Our approach is computationally
efficient due to the utilization of the Alternating Direction Method of
Multipliers (ADMM). A special case of our fast iterative algorithm leads to the
robust representation method which is normally used to handle non-contiguous
errors (e.g., pixel corruption). Extensive results on representative face
databases (in constrained and unconstrained environments) document the
effectiveness of our method over existing robust representation methods with
respect to both identification rates and computational time.
Code is available at Github, where you can find implementations of the
F-LR-IRNNLS and F-IRNNLS (fast version of the RRC) :
https://github.com/miliadis/FIRCComment: IEEE Transactions on Image Processing (TIP), 201
Relevance Subject Machine: A Novel Person Re-identification Framework
We propose a novel method called the Relevance Subject Machine (RSM) to solve
the person re-identification (re-id) problem. RSM falls under the category of
Bayesian sparse recovery algorithms and uses the sparse representation of the
input video under a pre-defined dictionary to identify the subject in the
video. Our approach focuses on the multi-shot re-id problem, which is the
prevalent problem in many video analytics applications. RSM captures the
essence of the multi-shot re-id problem by constraining the support of the
sparse codes for each input video frame to be the same. Our proposed approach
is also robust enough to deal with time varying outliers and occlusions by
introducing a sparse, non-stationary noise term in the model error. We provide
a novel Variational Bayesian based inference procedure along with an intuitive
interpretation of the proposed update rules. We evaluate our approach over
several commonly used re-id datasets and show superior performance over current
state-of-the-art algorithms. Specifically, for ILIDS-VID, a recent large scale
re-id dataset, RSM shows significant improvement over all published approaches,
achieving an 11.5% (absolute) improvement in rank 1 accuracy over the closest
competing algorithm considered.Comment: Submitted to IEEE Transactions on Pattern Analysis and Machine
Intelligenc
cvpaper.challenge in 2015 - A review of CVPR2015 and DeepSurvey
The "cvpaper.challenge" is a group composed of members from AIST, Tokyo Denki
Univ. (TDU), and Univ. of Tsukuba that aims to systematically summarize papers
on computer vision, pattern recognition, and related fields. For this
particular review, we focused on reading the ALL 602 conference papers
presented at the CVPR2015, the premier annual computer vision event held in
June 2015, in order to grasp the trends in the field. Further, we are proposing
"DeepSurvey" as a mechanism embodying the entire process from the reading
through all the papers, the generation of ideas, and to the writing of paper.Comment: Survey Pape
Robust Face Recognition via Adaptive Sparse Representation
Sparse Representation (or coding) based Classification (SRC) has gained great
success in face recognition in recent years. However, SRC emphasizes the
sparsity too much and overlooks the correlation information which has been
demonstrated to be critical in real-world face recognition problems. Besides,
some work considers the correlation but overlooks the discriminative ability of
sparsity. Different from these existing techniques, in this paper, we propose a
framework called Adaptive Sparse Representation based Classification (ASRC) in
which sparsity and correlation are jointly considered. Specifically, when the
samples are of low correlation, ASRC selects the most discriminative samples
for representation, like SRC; when the training samples are highly correlated,
ASRC selects most of the correlated and discriminative samples for
representation, rather than choosing some related samples randomly. In general,
the representation model is adaptive to the correlation structure, which
benefits from both -norm and -norm.
Extensive experiments conducted on publicly available data sets verify the
effectiveness and robustness of the proposed algorithm by comparing it with
state-of-the-art methods
Attention Driven Person Re-identification
Person re-identification (ReID) is a challenging task due to arbitrary human
pose variations, background clutters, etc. It has been studied extensively in
recent years, but the multifarious local and global features are still not
fully exploited by either ignoring the interplay between whole-body images and
body-part images or missing in-depth examination of specific body-part images.
In this paper, we propose a novel attention-driven multi-branch network that
learns robust and discriminative human representation from global whole-body
images and local body-part images simultaneously. Within each branch, an
intra-attention network is designed to search for informative and
discriminative regions within the whole-body or body-part images, where
attention is elegantly decomposed into spatial-wise attention and channel-wise
attention for effective and efficient learning. In addition, a novel
inter-attention module is designed which fuses the output of intra-attention
networks adaptively for optimal person ReID. The proposed technique has been
evaluated over three widely used datasets CUHK03, Market-1501 and
DukeMTMC-ReID, and experiments demonstrate its superior robustness and
effectiveness as compared with the state of the arts.Comment: Accepted in the Pattern Recognition (PR
Marrying Tracking with ELM: A Metric Constraint Guided Multiple Feature Fusion Method
Object Tracking is one important problem in computer vision and surveillance
system. The existing models mainly exploit the single-view feature (i.e. color,
texture, shape) to solve the problem, failing to describe the objects
comprehensively. In this paper, we solve the problem from multi-view
perspective by leveraging multi-view complementary and latent information, so
as to be robust to the partial occlusion and background clutter especially when
the objects are similar to the target, meanwhile addressing tracking drift.
However, one big problem is that multi-view fusion strategy can inevitably
result tracking into non-efficiency. To this end, we propose to marry ELM
(Extreme learning machine) to multi-view fusion to train the global hidden
output weight, to effectively exploit the local information from each view.
Following this principle, we propose a novel method to obtain the optimal
sample as the target object, which avoids tracking drift resulting from noisy
samples. Our method is evaluated over 12 challenge image sequences challenged
with different attributes including illumination, occlusion, deformation, etc.,
which demonstrates better performance than several state-of-the-art methods in
terms of effectiveness and robustness.Comment: arXiv admin note: substantial text overlap with arXiv:1807.1021
A Systematic Evaluation and Benchmark for Person Re-Identification: Features, Metrics, and Datasets
Person re-identification (re-id) is a critical problem in video analytics
applications such as security and surveillance. The public release of several
datasets and code for vision algorithms has facilitated rapid progress in this
area over the last few years. However, directly comparing re-id algorithms
reported in the literature has become difficult since a wide variety of
features, experimental protocols, and evaluation metrics are employed. In order
to address this need, we present an extensive review and performance evaluation
of single- and multi-shot re-id algorithms. The experimental protocol
incorporates the most recent advances in both feature extraction and metric
learning. To ensure a fair comparison, all of the approaches were implemented
using a unified code library that includes 11 feature extraction algorithms and
22 metric learning and ranking techniques. All approaches were evaluated using
a new large-scale dataset that closely mimics a real-world problem setting, in
addition to 16 other publicly available datasets: VIPeR, GRID, CAVIAR,
DukeMTMC4ReID, 3DPeS, PRID, V47, WARD, SAIVT-SoftBio, CUHK01, CHUK02, CUHK03,
RAiD, iLIDSVID, HDA+ and Market1501. The evaluation codebase and results will
be made publicly available for community use.Comment: Preliminary work on person Re-Id benchmark. S. Karanam and M. Gou
contributed equally. 14 pages, 6 figures, 4 tables. For supplementary
material, see
http://robustsystems.coe.neu.edu/sites/robustsystems.coe.neu.edu/files/systems/supmat/ReID_benchmark_supp.zi
- …