134,863 research outputs found
Deeply-Learned Part-Aligned Representations for Person Re-Identification
In this paper, we address the problem of person re-identification, which
refers to associating the persons captured from different cameras. We propose a
simple yet effective human part-aligned representation for handling the body
part misalignment problem. Our approach decomposes the human body into regions
(parts) which are discriminative for person matching, accordingly computes the
representations over the regions, and aggregates the similarities computed
between the corresponding regions of a pair of probe and gallery images as the
overall matching score. Our formulation, inspired by attention models, is a
deep neural network modeling the three steps together, which is learnt through
minimizing the triplet loss function without requiring body part labeling
information. Unlike most existing deep learning algorithms that learn a global
or spatial partition-based local representation, our approach performs human
body partition, and thus is more robust to pose changes and various human
spatial distributions in the person bounding box. Our approach shows
state-of-the-art results over standard datasets, Market-, CUHK,
CUHK and VIPeR.Comment: Accepted by ICCV 201
Person Search via A Mask-Guided Two-Stream CNN Model
In this work, we tackle the problem of person search, which is a challenging
task consisted of pedestrian detection and person re-identification~(re-ID).
Instead of sharing representations in a single joint model, we find that
separating detector and re-ID feature extraction yields better performance. In
order to extract more representative features for each identity, we segment out
the foreground person from the original image patch. We propose a simple yet
effective re-ID method, which models foreground person and original image
patches individually, and obtains enriched representations from two separate
CNN streams. From the experiments on two standard person search benchmarks of
CUHK-SYSU and PRW, we achieve mAP of and respectively,
surpassing the state of the art by a large margin (more than 5pp).Comment: accepted as poster to ECCV 201
The Devil is in the Middle: Exploiting Mid-level Representations for Cross-Domain Instance Matching
Many vision problems require matching images of object instances across
different domains. These include fine-grained sketch-based image retrieval
(FG-SBIR) and Person Re-identification (person ReID). Existing approaches
attempt to learn a joint embedding space where images from different domains
can be directly compared. In most cases, this space is defined by the output of
the final layer of a deep neural network (DNN), which primarily contains
features of a high semantic level. In this paper, we argue that both high and
mid-level features are relevant for cross-domain instance matching (CDIM).
Importantly, mid-level features already exist in earlier layers of the DNN.
They just need to be extracted, represented, and fused properly with the final
layer. Based on this simple but powerful idea, we propose a unified framework
for CDIM. Instantiating our framework for FG-SBIR and ReID, we show that our
simple models can easily beat the state-of-the-art models, which are often
equipped with much more elaborate architectures.Comment: Reference update
A Survey of Model Compression and Acceleration for Deep Neural Networks
Deep neural networks (DNNs) have recently achieved great success in many
visual recognition tasks. However, existing deep neural network models are
computationally expensive and memory intensive, hindering their deployment in
devices with low memory resources or in applications with strict latency
requirements. Therefore, a natural thought is to perform model compression and
acceleration in deep networks without significantly decreasing the model
performance. During the past five years, tremendous progress has been made in
this area. In this paper, we review the recent techniques for compacting and
accelerating DNN models. In general, these techniques are divided into four
categories: parameter pruning and quantization, low-rank factorization,
transferred/compact convolutional filters, and knowledge distillation. Methods
of parameter pruning and quantization are described first, after that the other
techniques are introduced. For each category, we also provide insightful
analysis about the performance, related applications, advantages, and
drawbacks. Then we go through some very recent successful methods, for example,
dynamic capacity networks and stochastic depths networks. After that, we survey
the evaluation matrices, the main datasets used for evaluating the model
performance, and recent benchmark efforts. Finally, we conclude this paper,
discuss remaining the challenges and possible directions for future work.Comment: Published in IEEE Signal Processing Magazine, updated version
including more recent work
Sharp Attention Network via Adaptive Sampling for Person Re-identification
In this paper, we present novel sharp attention networks by adaptively
sampling feature maps from convolutional neural networks (CNNs) for person
re-identification (re-ID) problem. Due to the introduction of sampling-based
attention models, the proposed approach can adaptively generate sharper
attention-aware feature masks. This greatly differs from the gating-based
attention mechanism that relies soft gating functions to select the relevant
features for person re-ID. In contrast, the proposed sampling-based attention
mechanism allows us to effectively trim irrelevant features by enforcing the
resultant feature masks to focus on the most discriminative features. It can
produce sharper attentions that are more assertive in localizing subtle
features relevant to re-identifying people across cameras. For this purpose, a
differentiable Gumbel-Softmax sampler is employed to approximate the Bernoulli
sampling to train the sharp attention networks. Extensive experimental
evaluations demonstrate the superiority of this new sharp attention model for
person re-ID over the other state-of-the-art methods on three challenging
benchmarks including CUHK03, Market-1501, and DukeMTMC-reID.Comment: accepted by IEEE Transactions on Circuits and Systems for Video
Technology(T-CSVT
Multi-Scale Body-Part Mask Guided Attention for Person Re-identification
Person re-identification becomes a more and more important task due to its
wide applications. In practice, person re-identification still remains
challenging due to the variation of person pose, different lighting, occlusion,
misalignment, background clutter, etc. In this paper, we propose a multi-scale
body-part mask guided attention network (MMGA), which jointly learns whole-body
and part body attention to help extract global and local features
simultaneously. In MMGA, body-part masks are used to guide the training of
corresponding attention. Experiments show that our proposed method can reduce
the negative influence of variation of person pose, misalignment and background
clutter. Our method achieves rank-1/mAP of 95.0%/87.2% on the Market1501
dataset, 89.5%/78.1% on the DukeMTMC-reID dataset, outperforming current
state-of-the-art methods
Recent Advances in Efficient Computation of Deep Convolutional Neural Networks
Deep neural networks have evolved remarkably over the past few years and they
are currently the fundamental tools of many intelligent systems. At the same
time, the computational complexity and resource consumption of these networks
also continue to increase. This will pose a significant challenge to the
deployment of such networks, especially in real-time applications or on
resource-limited devices. Thus, network acceleration has become a hot topic
within the deep learning community. As for hardware implementation of deep
neural networks, a batch of accelerators based on FPGA/ASIC have been proposed
in recent years. In this paper, we provide a comprehensive survey of recent
advances in network acceleration, compression and accelerator design from both
algorithm and hardware points of view. Specifically, we provide a thorough
analysis of each of the following topics: network pruning, low-rank
approximation, network quantization, teacher-student networks, compact network
design and hardware accelerators. Finally, we will introduce and discuss a few
possible future directions.Comment: 14 pages, 3 figure
Weighted Bilinear Coding over Salient Body Parts for Person Re-identification
Deep convolutional neural networks (CNNs) have demonstrated dominant
performance in person re-identification (Re-ID). Existing CNN based methods
utilize global average pooling (GAP) to aggregate intermediate convolutional
features for Re-ID. However, this strategy only considers the first-order
statistics of local features and treats local features at different locations
equally important, leading to sub-optimal feature representation. To deal with
these issues, we propose a novel weighted bilinear coding (WBC) framework for
local feature aggregation in CNN networks to pursue more representative and
discriminative feature representations, which can adapt to other
state-of-the-art methods and improve their performance. In specific, bilinear
coding is used to encode the channel-wise feature correlations to capture
richer feature interactions. Meanwhile, a weighting scheme is applied on the
bilinear coding to adaptively adjust the weights of local features at different
locations based on their importance in recognition, further improving the
discriminability of feature aggregation. To handle the spatial misalignment
issue, we use a salient part net (spatial attention module) to derive salient
body parts, and apply the WBC model on each part. The final representation,
formed by concatenating the WBC encoded features of each part, is both
discriminative and resistant to spatial misalignment. Experiments on three
benchmarks including Market-1501, DukeMTMC-reID and CUHK03 evidence the
favorable performance of our method against other outstanding methods.Comment: 22 page
Let Features Decide for Themselves: Feature Mask Network for Person Re-identification
Person re-identification aims at establishing the identity of a pedestrian
from a gallery that contains images of multiple people obtained from a
multi-camera system. Many challenges such as occlusions, drastic lighting and
pose variations across the camera views, indiscriminate visual appearances,
cluttered backgrounds, imperfect detections, motion blur, and noise make this
task highly challenging. While most approaches focus on learning features and
metrics to derive better representations, we hypothesize that both local and
global contextual cues are crucial for an accurate identity matching. To this
end, we propose a Feature Mask Network (FMN) that takes advantage of ResNet
high-level features to predict a feature map mask and then imposes it on the
low-level features to dynamically reweight different object parts for a locally
aware feature representation. This serves as an effective attention mechanism
by allowing the network to focus on local details selectively. Given the
resemblance of person re-identification with classification and retrieval
tasks, we frame the network training as a multi-task objective optimization,
which further improves the learned feature descriptions. We conduct experiments
on Market-1501, DukeMTMC-reID and CUHK03 datasets, where the proposed approach
respectively achieves significant improvements of , and
in mAP measure relative to the state-of-the-art.Comment: 10 pages, 4 figure
MagnifierNet: Towards Semantic Adversary and Fusion for Person Re-identification
Although person re-identification (ReID) has achieved significant improvement
recently by enforcing part alignment, it is still a challenging task when it
comes to distinguishing visually similar identities or identifying the occluded
person. In these scenarios, magnifying details in each part features and
selectively fusing them together may provide a feasible solution. In this work,
we propose MagnifierNet, a triple-branch network which accurately mines details
from whole to parts. Firstly, the holistic salient features are encoded by a
global branch. Secondly, to enhance detailed representation for each semantic
region, the "Semantic Adversarial Branch" is designed to learn from dynamically
generated semantic-occluded samples during training. Meanwhile, we introduce
"Semantic Fusion Branch" to filter out irrelevant noises by selectively fusing
semantic region information sequentially. To further improve feature diversity,
we introduce a novel loss function "Semantic Diversity Loss" to remove
redundant overlaps across learned semantic representations. State-of-the-art
performance has been achieved on three benchmarks by large margins.
Specifically, the mAP score is improved by 6% and 5% on the most challenging
CUHK03-L and CUHK03-D benchmarks
- …