97 research outputs found
Gated Siamese Convolutional Neural Network Architecture for Human Re-Identification
Matching pedestrians across multiple camera views, known as human
re-identification, is a challenging research problem that has numerous
applications in visual surveillance. With the resurgence of Convolutional
Neural Networks (CNNs), several end-to-end deep Siamese CNN architectures have
been proposed for human re-identification with the objective of projecting the
images of similar pairs (i.e. same identity) to be closer to each other and
those of dissimilar pairs to be distant from each other. However, current
networks extract fixed representations for each image regardless of other
images which are paired with it and the comparison with other images is done
only at the final level. In this setting, the network is at risk of failing to
extract finer local patterns that may be essential to distinguish positive
pairs from hard negative pairs. In this paper, we propose a gating function to
selectively emphasize such fine common local patterns by comparing the
mid-level features across pairs of images. This produces flexible
representations for the same image according to the images they are paired
with. We conduct experiments on the CUHK03, Market-1501 and VIPeR datasets and
demonstrate improved performance compared to a baseline Siamese CNN
architecture.Comment: Accepted to ECCV201
A Siamese Long Short-Term Memory Architecture for Human Re-Identification
Matching pedestrians across multiple camera views known as human
re-identification (re-identification) is a challenging problem in visual
surveillance. In the existing works concentrating on feature extraction,
representations are formed locally and independent of other regions. We present
a novel siamese Long Short-Term Memory (LSTM) architecture that can process
image regions sequentially and enhance the discriminative capability of local
feature representation by leveraging contextual information. The feedback
connections and internal gating mechanism of the LSTM cells enable our model to
memorize the spatial dependencies and selectively propagate relevant contextual
information through the network. We demonstrate improved performance compared
to the baseline algorithm with no LSTM units and promising results compared to
state-of-the-art methods on Market-1501, CUHK03 and VIPeR datasets.
Visualization of the internal mechanism of LSTM cells shows meaningful patterns
can be learned by our method
Person Re-identification Using Visual Attention
Despite recent attempts for solving the person re-identification problem, it
remains a challenging task since a person's appearance can vary significantly
when large variations in view angle, human pose, and illumination are involved.
In this paper, we propose a novel approach based on using a gradient-based
attention mechanism in deep convolution neural network for solving the person
re-identification problem. Our model learns to focus selectively on parts of
the input image for which the networks' output is most sensitive to and
processes them with high resolution while perceiving the surrounding image in
low resolution. Extensive comparative evaluations demonstrate that the proposed
method outperforms state-of-the-art approaches on the challenging CUHK01,
CUHK03, and Market 1501 datasets.Comment: Published at IEEE International Conference on Image Processing 201
Parameterizing Region Covariance: An Efficient Way To Apply Sparse Codes On Second Order Statistics
Sparse representations have been successfully applied to signal processing,
computer vision and machine learning. Currently there is a trend to learn
sparse models directly on structure data, such as region covariance. However,
such methods when combined with region covariance often require complex
computation. We present an approach to transform a structured sparse model
learning problem to a traditional vectorized sparse modeling problem by
constructing a Euclidean space representation for region covariance matrices.
Our new representation has multiple advantages. Experiments on several vision
tasks demonstrate competitive performance with the state-of-the-art methods
An Enhanced Deep Feature Representation for Person Re-identification
Feature representation and metric learning are two critical components in
person re-identification models. In this paper, we focus on the feature
representation and claim that hand-crafted histogram features can be
complementary to Convolutional Neural Network (CNN) features. We propose a
novel feature extraction model called Feature Fusion Net (FFN) for pedestrian
image representation. In FFN, back propagation makes CNN features constrained
by the handcrafted features. Utilizing color histogram features (RGB, HSV,
YCbCr, Lab and YIQ) and texture features (multi-scale and multi-orientation
Gabor features), we get a new deep feature representation that is more
discriminative and compact. Experiments on three challenging datasets (VIPeR,
CUHK01, PRID450s) validates the effectiveness of our proposal.Comment: Citation for this paper: Shangxuan Wu, Ying-Cong Chen, Xiang Li,
An-Cong Wu, Jin-Jie You, and Wei-Shi Zheng. An Enhanced Deep Feature
Representation for Person Re-identification. In IEEE WACV, 201
PersonNet: Person Re-identification with Deep Convolutional Neural Networks
In this paper, we propose a deep end-to-end neu- ral network to
simultaneously learn high-level features and a corresponding similarity metric
for person re-identification. The network takes a pair of raw RGB images as
input, and outputs a similarity value indicating whether the two input images
depict the same person. A layer of computing neighborhood range differences
across two input images is employed to capture local relationship between
patches. This operation is to seek a robust feature from input images. By
increasing the depth to 10 weight layers and using very small (33)
convolution filters, our architecture achieves a remarkable improvement on the
prior-art configurations. Meanwhile, an adaptive Root- Mean-Square (RMSProp)
gradient decent algorithm is integrated into our architecture, which is
beneficial to deep nets. Our method consistently outperforms state-of-the-art
on two large datasets (CUHK03 and Market-1501), and a medium-sized data set
(CUHK01).Comment: 7 pages. Fixed Figure 4 (a
Nonlinear Local Metric Learning for Person Re-identification
Person re-identification aims at matching pedestrians observed from
non-overlapping camera views. Feature descriptor and metric learning are two
significant problems in person re-identification. A discriminative metric
learning method should be capable of exploiting complex nonlinear
transformations due to the large variations in feature space. In this paper, we
propose a nonlinear local metric learning (NLML) method to improve the
state-of-the-art performance of person re-identification on public datasets.
Motivated by the fact that local metric learning has been introduced to handle
the data which varies locally and deep neural network has presented outstanding
capability in exploiting the nonlinearity of samples, we utilize the merits of
both local metric learning and deep neural network to learn multiple sets of
nonlinear transformations. By enforcing a margin between the distances of
positive pedestrian image pairs and distances of negative pairs in the
transformed feature subspace, discriminative information can be effectively
exploited in the developed neural networks. Our experiments show that the
proposed NLML method achieves the state-of-the-art results on the widely used
VIPeR, GRID, and CUHK 01 datasets.Comment: Submitted to CVPR 201
Constrained Deep Metric Learning for Person Re-identification
Person re-identification aims to re-identify the probe image from a given set
of images under different camera views. It is challenging due to large
variations of pose, illumination, occlusion and camera view. Since the
convolutional neural networks (CNN) have excellent capability of feature
extraction, certain deep learning methods have been recently applied in person
re-identification. However, in person re-identification, the deep networks
often suffer from the over-fitting problem. In this paper, we propose a novel
CNN-based method to learn a discriminative metric with good robustness to the
over-fitting problem in person re-identification. Firstly, a novel deep
architecture is built where the Mahalanobis metric is learned with a weight
constraint. This weight constraint is used to regularize the learning, so that
the learned metric has a better generalization ability. Secondly, we find that
the selection of intra-class sample pairs is crucial for learning but has
received little attention. To cope with the large intra-class variations in
pedestrian images, we propose a novel training strategy named moderate positive
mining to prevent the training process from over-fitting to the extreme samples
in intra-class pairs. Experiments show that our approach significantly
outperforms state-of-the-art methods on several benchmarks of person
re-identification.Comment: 11 pages, 16 figure
Metric Learning in Codebook Generation of Bag-of-Words for Person Re-identification
Person re-identification is generally divided into two part: first how to
represent a pedestrian by discriminative visual descriptors and second how to
compare them by suitable distance metrics. Conventional methods isolate these
two parts, the first part usually unsupervised and the second part supervised.
The Bag-of-Words (BoW) model is a widely used image representing descriptor in
part one. Its codebook is simply generated by clustering visual features in
Euclidian space. In this paper, we propose to use part two metric learning
techniques in the codebook generation phase of BoW. In particular, the proposed
codebook is clustered under Mahalanobis distance which is learned supervised.
Extensive experiments prove that our proposed method is effective. With several
low level features extracted on superpixel and fused together, our method
outperforms state-of-the-art on person re-identification benchmarks including
VIPeR, PRID450S, and Market1501
Learning Efficient Image Representation for Person Re-Identification
Color names based image representation is successfully used in person
re-identification, due to the advantages of being compact, intuitively
understandable as well as being robust to photometric variance. However, there
exists the diversity between underlying distribution of color names' RGB values
and that of image pixels' RGB values, which may lead to inaccuracy when
directly comparing them in Euclidean space. In this paper, we propose a new
method named soft Gaussian mapping (SGM) to address this problem. We model the
discrepancies between color names and pixels using a Gaussian and utilize the
inverse of covariance matrix to bridge the gap between them. Based on SGM, an
image could be converted to several soft Gaussian maps. In each soft Gaussian
map, we further seek to establish stable and robust descriptors within a local
region through a max pooling operation. Then, a robust image representation
based on color names is obtained by concatenating the statistical descriptors
in each stripe. When labeled data are available, one discriminative subspace
projection matrix is learned to build efficient representations of an image via
cross-view coupling learning. Experiments on the public datasets - VIPeR,
PRID450S and CUHK03, demonstrate the effectiveness of our method
- …