2,357 research outputs found
Re-Identification with Consistent Attentive Siamese Networks
We propose a new deep architecture for person re-identification (re-id).
While re-id has seen much recent progress, spatial localization and
view-invariant representation learning for robust cross-view matching remain
key, unsolved problems. We address these questions by means of a new
attention-driven Siamese learning architecture, called the Consistent Attentive
Siamese Network. Our key innovations compared to existing, competing methods
include (a) a flexible framework design that produces attention with only
identity labels as supervision, (b) explicit mechanisms to enforce attention
consistency among images of the same person, and (c) a new Siamese framework
that integrates attention and attention consistency, producing principled
supervisory signals as well as the first mechanism that can explain the
reasoning behind the Siamese framework's predictions. We conduct extensive
evaluations on the CUHK03-NP, DukeMTMC-ReID, and Market-1501 datasets and
report competitive performance.Comment: 10 pages, 8 figures, 3 tables, to appear in CVPR 201
Adaptive Affinity Fields for Semantic Segmentation
Semantic segmentation has made much progress with increasingly powerful
pixel-wise classifiers and incorporating structural priors via Conditional
Random Fields (CRF) or Generative Adversarial Networks (GAN). We propose a
simpler alternative that learns to verify the spatial structure of segmentation
during training only. Unlike existing approaches that enforce semantic labels
on individual pixels and match labels between neighbouring pixels, we propose
the concept of Adaptive Affinity Fields (AAF) to capture and match the semantic
relations between neighbouring pixels in the label space. We use adversarial
learning to select the optimal affinity field size for each semantic category.
It is formulated as a minimax problem, optimizing our segmentation neural
network in a best worst-case learning scenario. AAF is versatile for
representing structures as a collection of pixel-centric relations, easier to
train than GAN and more efficient than CRF without run-time inference. Our
extensive evaluations on PASCAL VOC 2012, Cityscapes, and GTA5 datasets
demonstrate its above-par segmentation performance and robust generalization
across domains.Comment: To appear in European Conference on Computer Vision (ECCV) 201
Video Salient Object Detection Using Spatiotemporal Deep Features
This paper presents a method for detecting salient objects in videos where
temporal information in addition to spatial information is fully taken into
account. Following recent reports on the advantage of deep features over
conventional hand-crafted features, we propose a new set of SpatioTemporal Deep
(STD) features that utilize local and global contexts over frames. We also
propose new SpatioTemporal Conditional Random Field (STCRF) to compute saliency
from STD features. STCRF is our extension of CRF to the temporal domain and
describes the relationships among neighboring regions both in a frame and over
frames. STCRF leads to temporally consistent saliency maps over frames,
contributing to the accurate detection of salient objects' boundaries and noise
reduction during detection. Our proposed method first segments an input video
into multiple scales and then computes a saliency map at each scale level using
STD features with STCRF. The final saliency map is computed by fusing saliency
maps at different scale levels. Our experiments, using publicly available
benchmark datasets, confirm that the proposed method significantly outperforms
state-of-the-art methods. We also applied our saliency computation to the video
object segmentation task, showing that our method outperforms existing video
object segmentation methods.Comment: accepted at TI
Ordered or Orderless: A Revisit for Video based Person Re-Identification
Is recurrent network really necessary for learning a good visual
representation for video based person re-identification (VPRe-id)? In this
paper, we first show that the common practice of employing recurrent neural
networks (RNNs) to aggregate temporal spatial features may not be optimal.
Specifically, with a diagnostic analysis, we show that the recurrent structure
may not be effective to learn temporal dependencies than what we expected and
implicitly yields an orderless representation. Based on this observation, we
then present a simple yet surprisingly powerful approach for VPRe-id, where we
treat VPRe-id as an efficient orderless ensemble of image based person
re-identification problem. More specifically, we divide videos into individual
images and re-identify person with ensemble of image based rankers. Under the
i.i.d. assumption, we provide an error bound that sheds light upon how could we
improve VPRe-id. Our work also presents a promising way to bridge the gap
between video and image based person re-identification. Comprehensive
experimental evaluations demonstrate that the proposed solution achieves
state-of-the-art performances on multiple widely used datasets (iLIDS-VID, PRID
2011, and MARS).Comment: Under Minor Revision in IEEE TPAM
Frame-wise Motion and Appearance for Real-time Multiple Object Tracking
The main challenge of Multiple Object Tracking (MOT) is the efficiency in
associating indefinite number of objects between video frames. Standard motion
estimators used in tracking, e.g., Long Short Term Memory (LSTM), only deal
with single object, while Re-IDentification (Re-ID) based approaches
exhaustively compare object appearances. Both approaches are computationally
costly when they are scaled to a large number of objects, making it very
difficult for real-time MOT. To address these problems, we propose a highly
efficient Deep Neural Network (DNN) that simultaneously models association
among indefinite number of objects. The inference computation of the DNN does
not increase with the number of objects. Our approach, Frame-wise Motion and
Appearance (FMA), computes the Frame-wise Motion Fields (FMF) between two
frames, which leads to very fast and reliable matching among a large number of
object bounding boxes. As auxiliary information is used to fix uncertain
matches, Frame-wise Appearance Features (FAF) are learned in parallel with
FMFs. Extensive experiments on the MOT17 benchmark show that our method
achieved real-time MOT with competitive results as the state-of-the-art
approaches.Comment: 13 pages, 4 figures, 4 table
cvpaper.challenge in 2015 - A review of CVPR2015 and DeepSurvey
The "cvpaper.challenge" is a group composed of members from AIST, Tokyo Denki
Univ. (TDU), and Univ. of Tsukuba that aims to systematically summarize papers
on computer vision, pattern recognition, and related fields. For this
particular review, we focused on reading the ALL 602 conference papers
presented at the CVPR2015, the premier annual computer vision event held in
June 2015, in order to grasp the trends in the field. Further, we are proposing
"DeepSurvey" as a mechanism embodying the entire process from the reading
through all the papers, the generation of ideas, and to the writing of paper.Comment: Survey Pape
Learning Context Graph for Person Search
Person re-identification has achieved great progress with deep convolutional
neural networks. However, most previous methods focus on learning individual
appearance feature embedding, and it is hard for the models to handle difficult
situations with different illumination, large pose variance and occlusion. In
this work, we take a step further and consider employing context information
for person search. For a probe-gallery pair, we first propose a contextual
instance expansion module, which employs a relative attention module to search
and filter useful context information in the scene. We also build a graph
learning framework to effectively employ context pairs to update target
similarity. These two modules are built on top of a joint detection and
instance feature learning framework, which improves the discriminativeness of
the learned features. The proposed framework achieves state-of-the-art
performance on two widely used person search datasets.Comment: To appear in CVPR 201
Probabilistic Semantic Retrieval for Surveillance Videos with Activity Graphs
We present a novel framework for finding complex activities matching
user-described queries in cluttered surveillance videos. The wide diversity of
queries coupled with unavailability of annotated activity data limits our
ability to train activity models. To bridge the semantic gap we propose to let
users describe an activity as a semantic graph with object attributes and
inter-object relationships associated with nodes and edges, respectively. We
learn node/edge-level visual predictors during training and, at test-time,
propose to retrieve activity by identifying likely locations that match the
semantic graph. We formulate a novel CRF based probabilistic activity
localization objective that accounts for mis-detections, mis-classifications
and track-losses, and outputs a likelihood score for a candidate grounded
location of the query in the video. We seek groundings that maximize overall
precision and recall. To handle the combinatorial search over all
high-probability groundings, we propose a highest precision subgraph matching
algorithm. Our method outperforms existing retrieval methods on benchmarked
datasets.Comment: 1520-9210 (c) 2018 IEEE. This paper has been accepted by IEEE
Transactions on Multimedia. Print ISSN: 1520-9210. Online ISSN: 1941-0077.
Preprint link is https://ieeexplore.ieee.org/document/8438958
An Introduction to Person Re-identification with Generative Adversarial Networks
Person re-identification is a basic subject in the field of computer vision.
The traditional methods have several limitations in solving the problems of
person illumination like occlusion, pose variation and feature variation under
complex background. Fortunately, deep learning paradigm opens new ways of the
person re-identification research and becomes a hot spot in this field.
Generative Adversarial Nets (GANs) in the past few years attracted lots of
attention in solving these problems. This paper reviews the GAN based methods
for person re-identification focuses on the related papers about different GAN
based frameworks and discusses their advantages and disadvantages. Finally, it
proposes the direction of future research, especially the prospect of person
re-identification methods based on GANs
Pyramidal Person Re-IDentification via Multi-Loss Dynamic Training
Most existing Re-IDentification (Re-ID) methods are highly dependent on
precise bounding boxes that enable images to be aligned with each other.
However, due to the challenging practical scenarios, current detection models
often produce inaccurate bounding boxes, which inevitably degenerate the
performance of existing Re-ID algorithms. In this paper, we propose a novel
coarse-to-fine pyramid model to relax the need of bounding boxes, which not
only incorporates local and global information, but also integrates the gradual
cues between them. The pyramid model is able to match at different scales and
then search for the correct image of the same identity, even when the image
pairs are not aligned. In addition, in order to learn discriminative identity
representation, we explore a dynamic training scheme to seamlessly unify two
losses and extract appropriate shared information between them. Experimental
results clearly demonstrate that the proposed method achieves the
state-of-the-art results on three datasets. Especially, our approach exceeds
the current best method by 9.5% on the most challenging CUHK03 dataset.Comment: Accepted by 2019 Conference on Computer Vision and Pattern
Recognitio
- …