31,955 research outputs found
Adaptive Re-ranking of Deep Feature for Person Re-identification
Typical person re-identification (re-ID) methods train a deep CNN to extract
deep features and combine them with a distance metric for the final evaluation.
In this work, we focus on exploiting the full information encoded in the deep
feature to boost the re-ID performance. First, we propose a Deep Feature Fusion
(DFF) method to exploit the diverse information embedded in a deep feature. DFF
treats each sub-feature as an information carrier and employs a diffusion
process to exchange their information. Second, we propose an Adaptive
Re-Ranking (ARR) method to exploit the contextual information encoded in the
features of neighbors. ARR utilizes the contextual information to re-rank the
retrieval results in an iterative manner. Particularly, it adds more contextual
information after each iteration automatically to consider more matches. Third,
we propose a strategy that combines DFF and ARR to enhance the performance.
Extensive comparative evaluations demonstrate the superiority of the proposed
methods on three large benchmarks
PRISM: Person Re-Identification via Structured Matching
Person re-identification (re-id), an emerging problem in visual surveillance,
deals with maintaining entities of individuals whilst they traverse various
locations surveilled by a camera network. From a visual perspective re-id is
challenging due to significant changes in visual appearance of individuals in
cameras with different pose, illumination and calibration. Globally the
challenge arises from the need to maintain structurally consistent matches
among all the individual entities across different camera views. We propose
PRISM, a structured matching method to jointly account for these challenges. We
view the global problem as a weighted graph matching problem and estimate edge
weights by learning to predict them based on the co-occurrences of visual
patterns in the training examples. These co-occurrence based scores in turn
account for appearance changes by inferring likely and unlikely visual
co-occurrences appearing in training instances. We implement PRISM on single
shot and multi-shot scenarios. PRISM uniformly outperforms state-of-the-art in
terms of matching rate while being computationally efficient
Enhancing Person Re-identification in a Self-trained Subspace
Despite the promising progress made in recent years, person re-identification
(re-ID) remains a challenging task due to the complex variations in human
appearances from different camera views. For this challenging problem, a large
variety of algorithms have been developed in the fully-supervised setting,
requiring access to a large amount of labeled training data. However, the main
bottleneck for fully-supervised re-ID is the limited availability of labeled
training samples. To address this problem, in this paper, we propose a
self-trained subspace learning paradigm for person re-ID which effectively
utilizes both labeled and unlabeled data to learn a discriminative subspace
where person images across disjoint camera views can be easily matched. The
proposed approach first constructs pseudo pairwise relationships among
unlabeled persons using the k-nearest neighbors algorithm. Then, with the
pseudo pairwise relationships, the unlabeled samples can be easily combined
with the labeled samples to learn a discriminative projection by solving an
eigenvalue problem. In addition, we refine the pseudo pairwise relationships
iteratively, which further improves the learning performance. A multi-kernel
embedding strategy is also incorporated into the proposed approach to cope with
the non-linearity in person's appearance and explore the complementation of
multiple kernels. In this way, the performance of person re-ID can be greatly
enhanced when training data are insufficient. Experimental results on six
widely-used datasets demonstrate the effectiveness of our approach and its
performance can be comparable to the reported results of most state-of-the-art
fully-supervised methods while using much fewer labeled data.Comment: Accepted by ACM Transactions on Multimedia Computing, Communications,
and Applications (TOMM
Multi-feature Fusion for Image Retrieval Using Constrained Dominant Sets
Aggregating different image features for image retrieval has recently shown
its effectiveness. While highly effective, though, the question of how to
uplift the impact of the best features for a specific query image persists as
an open computer vision problem. In this paper, we propose a computationally
efficient approach to fuse several hand-crafted and deep features, based on the
probabilistic distribution of a given membership score of a constrained cluster
in an unsupervised manner. First, we introduce an incremental nearest neighbor
(NN) selection method, whereby we dynamically select k-NN to the query. We then
build several graphs from the obtained NN sets and employ constrained dominant
sets (CDS) on each graph G to assign edge weights which consider the intrinsic
manifold structure of the graph, and detect false matches to the query.
Finally, we elaborate the computation of feature positive-impact weight (PIW)
based on the dispersive degree of the characteristics vector. To this end, we
exploit the entropy of a cluster membership-score distribution. In addition,
the final NN set bypasses a heuristic voting scheme. Experiments on several
retrieval benchmark datasets show that our method can improve the
state-of-the-art result
PVSS: A Progressive Vehicle Search System for Video Surveillance Networks
This paper is focused on the task of searching for a specific vehicle that
appeared in the surveillance networks. Existing methods usually assume the
vehicle images are well cropped from the surveillance videos, then use visual
attributes, like colors and types, or license plate numbers to match the target
vehicle in the image set. However, a complete vehicle search system should
consider the problems of vehicle detection, representation, indexing, storage,
matching, and so on. Besides, attribute-based search cannot accurately find the
same vehicle due to intra-instance changes in different cameras and the
extremely uncertain environment. Moreover, the license plates may be
misrecognized in surveillance scenes due to the low resolution and noise. In
this paper, a Progressive Vehicle Search System, named as PVSS, is designed to
solve the above problems. PVSS is constituted of three modules: the crawler,
the indexer, and the searcher. The vehicle crawler aims to detect and track
vehicles in surveillance videos and transfer the captured vehicle images,
metadata and contextual information to the server or cloud. Then multi-grained
attributes, such as the visual features and license plate fingerprints, are
extracted and indexed by the vehicle indexer. At last, a query triplet with an
input vehicle image, the time range, and the spatial scope is taken as the
input by the vehicle searcher. The target vehicle will be searched in the
database by a progressive process. Extensive experiments on the public dataset
from a real surveillance network validate the effectiveness of the PVSS
cvpaper.challenge in 2015 - A review of CVPR2015 and DeepSurvey
The "cvpaper.challenge" is a group composed of members from AIST, Tokyo Denki
Univ. (TDU), and Univ. of Tsukuba that aims to systematically summarize papers
on computer vision, pattern recognition, and related fields. For this
particular review, we focused on reading the ALL 602 conference papers
presented at the CVPR2015, the premier annual computer vision event held in
June 2015, in order to grasp the trends in the field. Further, we are proposing
"DeepSurvey" as a mechanism embodying the entire process from the reading
through all the papers, the generation of ideas, and to the writing of paper.Comment: Survey Pape
Deep Co-attention based Comparators For Relative Representation Learning in Person Re-identification
Person re-identification (re-ID) requires rapid, flexible yet discriminant
representations to quickly generalize to unseen observations on-the-fly and
recognize the same identity across disjoint camera views. Recent effective
methods are developed in a pair-wise similarity learning system to detect a
fixed set of features from distinct regions which are mapped to their vector
embeddings for the distance measuring. However, the most relevant and crucial
parts of each image are detected independently without referring to the
dependency conditioned on one and another. Also, these region based methods
rely on spatial manipulation to position the local features in comparable
similarity measuring. To combat these limitations, in this paper we introduce
the Deep Co-attention based Comparators (DCCs) that fuse the co-dependent
representations of the paired images so as to focus on the relevant parts of
both images and produce their \textit{relative representations}. Given a pair
of pedestrian images to be compared, the proposed model mimics the foveation of
human eyes to detect distinct regions concurrent on both images, namely
co-dependent features, and alternatively attend to relevant regions to fuse
them into the similarity learning. Our comparator is capable of producing
dynamic representations relative to a particular sample every time, and thus
well-suited to the case of re-identifying pedestrians on-the-fly. We perform
extensive experiments to provide the insights and demonstrate the effectiveness
of the proposed DCCs in person re-ID. Moreover, our approach has achieved the
state-of-the-art performance on three benchmark data sets: DukeMTMC-reID
\cite{DukeMTMC}, CUHK03 \cite{FPNN}, and Market-1501 \cite{Market1501}
Person Re-Identification by Camera Correlation Aware Feature Augmentation
The challenge of person re-identification (re-id) is to match individual
images of the same person captured by different non-overlapping camera views
against significant and unknown cross-view feature distortion. While a large
number of distance metric/subspace learning models have been developed for
re-id, the cross-view transformations they learned are view-generic and thus
potentially less effective in quantifying the feature distortion inherent to
each camera view. Learning view-specific feature transformations for re-id
(i.e., view-specific re-id), an under-studied approach, becomes an alternative
resort for this problem. In this work, we formulate a novel view-specific
person re-identification framework from the feature augmentation point of view,
called Camera coRrelation Aware Feature augmenTation (CRAFT). Specifically,
CRAFT performs cross-view adaptation by automatically measuring camera
correlation from cross-view visual data distribution and adaptively conducting
feature augmentation to transform the original features into a new adaptive
space. Through our augmentation framework, view-generic learning algorithms can
be readily generalized to learn and optimize view-specific sub-models whilst
simultaneously modelling view-generic discrimination information. Therefore,
our framework not only inherits the strength of view-generic model learning but
also provides an effective way to take into account view specific
characteristics. Our CRAFT framework can be extended to jointly learn
view-specific feature transformations for person re-id across a large network
with more than two cameras, a largely under-investigated but realistic re-id
setting. Additionally, we present a domain-generic deep person appearance
representation which is designed particularly to be towards view invariant for
facilitating cross-view adaptation by CRAFT.Comment: To Appear in IEEE Transactions on Pattern Analysis and Machine
Intelligence, 201
The Devil is in the Middle: Exploiting Mid-level Representations for Cross-Domain Instance Matching
Many vision problems require matching images of object instances across
different domains. These include fine-grained sketch-based image retrieval
(FG-SBIR) and Person Re-identification (person ReID). Existing approaches
attempt to learn a joint embedding space where images from different domains
can be directly compared. In most cases, this space is defined by the output of
the final layer of a deep neural network (DNN), which primarily contains
features of a high semantic level. In this paper, we argue that both high and
mid-level features are relevant for cross-domain instance matching (CDIM).
Importantly, mid-level features already exist in earlier layers of the DNN.
They just need to be extracted, represented, and fused properly with the final
layer. Based on this simple but powerful idea, we propose a unified framework
for CDIM. Instantiating our framework for FG-SBIR and ReID, we show that our
simple models can easily beat the state-of-the-art models, which are often
equipped with much more elaborate architectures.Comment: Reference update
Exploring Spatial Significance via Hybrid Pyramidal Graph Network for Vehicle Re-identification
Existing vehicle re-identification methods commonly use spatial pooling
operations to aggregate feature maps extracted via off-the-shelf backbone
networks. They ignore exploring the spatial significance of feature maps,
eventually degrading the vehicle re-identification performance. In this paper,
firstly, an innovative spatial graph network (SGN) is proposed to elaborately
explore the spatial significance of feature maps. The SGN stacks multiple
spatial graphs (SGs). Each SG assigns feature map's elements as nodes and
utilizes spatial neighborhood relationships to determine edges among nodes.
During the SGN's propagation, each node and its spatial neighbors on an SG are
aggregated to the next SG. On the next SG, each aggregated node is re-weighted
with a learnable parameter to find the significance at the corresponding
location. Secondly, a novel pyramidal graph network (PGN) is designed to
comprehensively explore the spatial significance of feature maps at multiple
scales. The PGN organizes multiple SGNs in a pyramidal manner and makes each
SGN handles feature maps of a specific scale. Finally, a hybrid pyramidal graph
network (HPGN) is developed by embedding the PGN behind a ResNet-50 based
backbone network. Extensive experiments on three large scale vehicle databases
(i.e., VeRi776, VehicleID, and VeRi-Wild) demonstrate that the proposed HPGN is
superior to state-of-the-art vehicle re-identification approaches
- …