179 research outputs found
PaMM: Pose-aware Multi-shot Matching for Improving Person Re-identification
Person re-identification is the problem of recognizing people across
different images or videos with non-overlapping views. Although there has been
much progress in person re-identification over the last decade, it remains a
challenging task because appearances of people can seem extremely different
across diverse camera viewpoints and person poses. In this paper, we propose a
novel framework for person re-identification by analyzing camera viewpoints and
person poses in a so-called Pose-aware Multi-shot Matching (PaMM), which
robustly estimates people's poses and efficiently conducts multi-shot matching
based on pose information. Experimental results using public person
re-identification datasets show that the proposed methods outperform
state-of-the-art methods and are promising for person re-identification from
diverse viewpoints and pose variances.Comment: 12 pages, 12 figures, 4 table
Component-based Attention for Large-scale Trademark Retrieval
The demand for large-scale trademark retrieval (TR) systems has significantly
increased to combat the rise in international trademark infringement.
Unfortunately, the ranking accuracy of current approaches using either
hand-crafted or pre-trained deep convolution neural network (DCNN) features is
inadequate for large-scale deployments. We show in this paper that the ranking
accuracy of TR systems can be significantly improved by incorporating hard and
soft attention mechanisms, which direct attention to critical information such
as figurative elements and reduce attention given to distracting and
uninformative elements such as text and background. Our proposed approach
achieves state-of-the-art results on a challenging large-scale trademark
dataset.Comment: Fix typos related to authors' informatio
A Flow-Guided Mutual Attention Network for Video-Based Person Re-Identification
Person Re-Identification (ReID) is a challenging problem in many video
analytics and surveillance applications, where a person's identity must be
associated across a distributed non-overlapping network of cameras. Video-based
person ReID has recently gained much interest because it allows capturing
discriminant spatio-temporal information from video clips that is unavailable
for image-based ReID. Despite recent advances, deep learning (DL) models for
video ReID often fail to leverage this information to improve the robustness of
feature representations. In this paper, the motion pattern of a person is
explored as an additional cue for ReID. In particular, a flow-guided Mutual
Attention network is proposed for fusion of image and optical flow sequences
using any 2D-CNN backbone, allowing to encode temporal information along with
spatial appearance information. Our Mutual Attention network relies on the
joint spatial attention between image and optical flow features maps to
activate a common set of salient features across them. In addition to
flow-guided attention, we introduce a method to aggregate features from longer
input streams for better video sequence-level representation. Our extensive
experiments on three challenging video ReID datasets indicate that using the
proposed Mutual Attention network allows to improve recognition accuracy
considerably with respect to conventional gated-attention networks, and
state-of-the-art methods for video-based person ReID
Group Re-Identification with Multi-grained Matching and Integration
The task of re-identifying groups of people underdifferent camera views is an
important yet less-studied problem.Group re-identification (Re-ID) is a very
challenging task sinceit is not only adversely affected by common issues in
traditionalsingle object Re-ID problems such as viewpoint and human
posevariations, but it also suffers from changes in group layout andgroup
membership. In this paper, we propose a novel conceptof group granularity by
characterizing a group image by multi-grained objects: individual persons and
sub-groups of two andthree people within a group. To achieve robust group
Re-ID,we first introduce multi-grained representations which can beextracted
via the development of two separate schemes, i.e. onewith hand-crafted
descriptors and another with deep neuralnetworks. The proposed representation
seeks to characterize bothappearance and spatial relations of multi-grained
objects, and isfurther equipped with importance weights which capture
varia-tions in intra-group dynamics. Optimal group-wise matching isfacilitated
by a multi-order matching process which in turn,dynamically updates the
importance weights in iterative fashion.We evaluated on three multi-camera
group datasets containingcomplex scenarios and large dynamics, with
experimental resultsdemonstrating the effectiveness of our approach. The
published dataset can be found in
\url{http://min.sjtu.edu.cn/lwydemo/GroupReID.html}Comment: 14 pages, 10 figures, to appear in IEEE transaction on Cybernetic
cvpaper.challenge in 2016: Futuristic Computer Vision through 1,600 Papers Survey
The paper gives futuristic challenges disscussed in the cvpaper.challenge. In
2015 and 2016, we thoroughly study 1,600+ papers in several
conferences/journals such as CVPR/ICCV/ECCV/NIPS/PAMI/IJCV
Person Search via A Mask-Guided Two-Stream CNN Model
In this work, we tackle the problem of person search, which is a challenging
task consisted of pedestrian detection and person re-identification~(re-ID).
Instead of sharing representations in a single joint model, we find that
separating detector and re-ID feature extraction yields better performance. In
order to extract more representative features for each identity, we segment out
the foreground person from the original image patch. We propose a simple yet
effective re-ID method, which models foreground person and original image
patches individually, and obtains enriched representations from two separate
CNN streams. From the experiments on two standard person search benchmarks of
CUHK-SYSU and PRW, we achieve mAP of and respectively,
surpassing the state of the art by a large margin (more than 5pp).Comment: accepted as poster to ECCV 201
Person Re-Identification using Deep Learning Networks: A Systematic Review
Person re-identification has received a lot of attention from the research
community in recent times. Due to its vital role in security based
applications, person re-identification lies at the heart of research relevant
to tracking robberies, preventing terrorist attacks and other security critical
events. While the last decade has seen tremendous growth in re-id approaches,
very little review literature exists to comprehend and summarize this progress.
This review deals with the latest state-of-the-art deep learning based
approaches for person re-identification. While the few existing re-id review
works have analysed re-id techniques from a singular aspect, this review
evaluates numerous re-id techniques from multiple deep learning aspects such as
deep architecture types, common Re-Id challenges (variation in pose, lightning,
view, scale, partial or complete occlusion, background clutter), multi-modal
Re-Id, cross-domain Re-Id challenges, metric learning approaches and video
Re-Id contributions. This review also includes several re-id benchmarks
collected over the years, describing their characteristics, specifications and
top re-id results obtained on them. The inclusion of the latest deep re-id
works makes this a significant contribution to the re-id literature. Lastly,
the conclusion and future directions are included.Comment: 34 pages, 15 figure
Large Margin Learning in Set to Set Similarity Comparison for Person Re-identification
Person re-identification (Re-ID) aims at matching images of the same person
across disjoint camera views, which is a challenging problem in multimedia
analysis, multimedia editing and content-based media retrieval communities. The
major challenge lies in how to preserve similarity of the same person across
video footages with large appearance variations, while discriminating different
individuals. To address this problem, conventional methods usually consider the
pairwise similarity between persons by only measuring the point to point (P2P)
distance. In this paper, we propose to use deep learning technique to model a
novel set to set (S2S) distance, in which the underline objective focuses on
preserving the compactness of intra-class samples for each camera view, while
maximizing the margin between the intra-class set and inter-class set. The S2S
distance metric is consisted of three terms, namely the class-identity term,
the relative distance term and the regularization term. The class-identity term
keeps the intra-class samples within each camera view gathering together, the
relative distance term maximizes the distance between the intra-class class set
and inter-class set across different camera views, and the regularization term
smoothness the parameters of deep convolutional neural network (CNN). As a
result, the final learned deep model can effectively find out the matched
target to the probe object among various candidates in the video gallery by
learning discriminative and stable feature representations. Using the CUHK01,
CUHK03, PRID2011 and Market1501 benchmark datasets, we extensively conducted
comparative evaluations to demonstrate the advantages of our method over the
state-of-the-art approaches.Comment: Accepted by IEEE Transactions on Multimedi
Recent Advance in Content-based Image Retrieval: A Literature Survey
The explosive increase and ubiquitous accessibility of visual data on the Web
have led to the prosperity of research activity in image search or retrieval.
With the ignorance of visual content as a ranking clue, methods with text
search techniques for visual retrieval may suffer inconsistency between the
text words and visual content. Content-based image retrieval (CBIR), which
makes use of the representation of visual content to identify relevant images,
has attracted sustained attention in recent two decades. Such a problem is
challenging due to the intention gap and the semantic gap problems. Numerous
techniques have been developed for content-based image retrieval in the last
decade. The purpose of this paper is to categorize and evaluate those
algorithms proposed during the period of 2003 to 2016. We conclude with several
promising directions for future research.Comment: 22 page
Salient Objects in Clutter
This paper identifies and addresses a serious design bias of existing salient
object detection (SOD) datasets, which unrealistically assume that each image
should contain at least one clear and uncluttered salient object. This design
bias has led to a saturation in performance for state-of-the-art SOD models
when evaluated on existing datasets. However, these models are still far from
satisfactory when applied to real-world scenes. Based on our analyses, we
propose a new high-quality dataset and update the previous saliency benchmark.
Specifically, our dataset, called Salient Objects in Clutter~\textbf{(SOC)},
includes images with both salient and non-salient objects from several common
object categories. In addition to object category annotations, each salient
image is accompanied by attributes that reflect common challenges in common
scenes, which can help provide deeper insight into the SOD problem. Further,
with a given saliency encoder, e.g., the backbone network, existing saliency
models are designed to achieve mapping from the training image set to the
training ground-truth set. We, therefore, argue that improving the dataset can
yield higher performance gains than focusing only on the decoder design. With
this in mind, we investigate several dataset-enhancement strategies, including
label smoothing to implicitly emphasize salient boundaries, random image
augmentation to adapt saliency models to various scenarios, and self-supervised
learning as a regularization strategy to learn from small datasets. Our
extensive results demonstrate the effectiveness of these tricks. We also
provide a comprehensive benchmark for SOD, which can be found in our
repository: https://github.com/DengPingFan/SODBenchmark.Comment: 349 references, 20 pages, survey 201 models, benchmark 100 models.
Online benchmark: https://github.com/DengPingFan/SODBenchmar
- …