3,560 research outputs found
Pose-Normalized Image Generation for Person Re-identification
Person Re-identification (re-id) faces two major challenges: the lack of
cross-view paired training data and learning discriminative identity-sensitive
and view-invariant features in the presence of large pose variations. In this
work, we address both problems by proposing a novel deep person image
generation model for synthesizing realistic person images conditional on the
pose. The model is based on a generative adversarial network (GAN) designed
specifically for pose normalization in re-id, thus termed pose-normalization
GAN (PN-GAN). With the synthesized images, we can learn a new type of deep
re-id feature free of the influence of pose variations. We show that this
feature is strong on its own and complementary to features learned with the
original images. Importantly, under the transfer learning setting, we show that
our model generalizes well to any new re-id dataset without the need for
collecting any training data for model fine-tuning. The model thus has the
potential to make re-id model truly scalable.Comment: 10 pages, 5 figure
A Discriminatively Learned CNN Embedding for Person Re-identification
We revisit two popular convolutional neural networks (CNN) in person
re-identification (re-ID), i.e, verification and classification models. The two
models have their respective advantages and limitations due to different loss
functions. In this paper, we shed light on how to combine the two models to
learn more discriminative pedestrian descriptors. Specifically, we propose a
new siamese network that simultaneously computes identification loss and
verification loss. Given a pair of training images, the network predicts the
identities of the two images and whether they belong to the same identity. Our
network learns a discriminative embedding and a similarity measurement at the
same time, thus making full usage of the annotations. Albeit simple, the
learned embedding improves the state-of-the-art performance on two public
person re-ID benchmarks. Further, we show our architecture can also be applied
in image retrieval
Learning to count with deep object features
Learning to count is a learning strategy that has been recently proposed in
the literature for dealing with problems where estimating the number of object
instances in a scene is the final objective. In this framework, the task of
learning to detect and localize individual object instances is seen as a harder
task that can be evaded by casting the problem as that of computing a
regression value from hand-crafted image features. In this paper we explore the
features that are learned when training a counting convolutional neural network
in order to understand their underlying representation. To this end we define a
counting problem for MNIST data and show that the internal representation of
the network is able to classify digits in spite of the fact that no direct
supervision was provided for them during training. We also present preliminary
results about a deep network that is able to count the number of pedestrians in
a scene.Comment: This paper has been accepted at Deep Vision Workshop at CVPR 201
Pedestrian Attribute Recognition: A Survey
Recognizing pedestrian attributes is an important task in computer vision
community due to it plays an important role in video surveillance. Many
algorithms has been proposed to handle this task. The goal of this paper is to
review existing works using traditional methods or based on deep learning
networks. Firstly, we introduce the background of pedestrian attributes
recognition (PAR, for short), including the fundamental concepts of pedestrian
attributes and corresponding challenges. Secondly, we introduce existing
benchmarks, including popular datasets and evaluation criterion. Thirdly, we
analyse the concept of multi-task learning and multi-label learning, and also
explain the relations between these two learning algorithms and pedestrian
attribute recognition. We also review some popular network architectures which
have widely applied in the deep learning community. Fourthly, we analyse
popular solutions for this task, such as attributes group, part-based,
\emph{etc}. Fifthly, we shown some applications which takes pedestrian
attributes into consideration and achieve better performance. Finally, we
summarized this paper and give several possible research directions for
pedestrian attributes recognition. The project page of this paper can be found
from the following website:
\url{https://sites.google.com/view/ahu-pedestrianattributes/}.Comment: Check our project page for High Resolution version of this survey:
https://sites.google.com/view/ahu-pedestrianattributes
Simple yet efficient real-time pose-based action recognition
Recognizing human actions is a core challenge for autonomous systems as they
directly share the same space with humans. Systems must be able to recognize
and assess human actions in real-time. In order to train corresponding
data-driven algorithms, a significant amount of annotated training data is
required. We demonstrated a pipeline to detect humans, estimate their pose,
track them over time and recognize their actions in real-time with standard
monocular camera sensors. For action recognition, we encode the human pose into
a new data format called Encoded Human Pose Image (EHPI) that can then be
classified using standard methods from the computer vision community. With this
simple procedure we achieve competitive state-of-the-art performance in
pose-based action detection and can ensure real-time performance. In addition,
we show a use case in the context of autonomous driving to demonstrate how such
a system can be trained to recognize human actions using simulation data.Comment: Submitted to IEEE Intelligent Transportation Systems Conference
(ITSC) 2019. Code will be available soon at
https://github.com/noboevbo/ehpi_action_recognitio
- …