31,233 research outputs found
Pedestrian Attribute Recognition: A Survey
Recognizing pedestrian attributes is an important task in computer vision
community due to it plays an important role in video surveillance. Many
algorithms has been proposed to handle this task. The goal of this paper is to
review existing works using traditional methods or based on deep learning
networks. Firstly, we introduce the background of pedestrian attributes
recognition (PAR, for short), including the fundamental concepts of pedestrian
attributes and corresponding challenges. Secondly, we introduce existing
benchmarks, including popular datasets and evaluation criterion. Thirdly, we
analyse the concept of multi-task learning and multi-label learning, and also
explain the relations between these two learning algorithms and pedestrian
attribute recognition. We also review some popular network architectures which
have widely applied in the deep learning community. Fourthly, we analyse
popular solutions for this task, such as attributes group, part-based,
\emph{etc}. Fifthly, we shown some applications which takes pedestrian
attributes into consideration and achieve better performance. Finally, we
summarized this paper and give several possible research directions for
pedestrian attributes recognition. The project page of this paper can be found
from the following website:
\url{https://sites.google.com/view/ahu-pedestrianattributes/}.Comment: Check our project page for High Resolution version of this survey:
https://sites.google.com/view/ahu-pedestrianattributes
Structure propagation for zero-shot learning
The key of zero-shot learning (ZSL) is how to find the information transfer
model for bridging the gap between images and semantic information (texts or
attributes). Existing ZSL methods usually construct the compatibility function
between images and class labels with the consideration of the relevance on the
semantic classes (the manifold structure of semantic classes). However, the
relationship of image classes (the manifold structure of image classes) is also
very important for the compatibility model construction. It is difficult to
capture the relationship among image classes due to unseen classes, so that the
manifold structure of image classes often is ignored in ZSL. To complement each
other between the manifold structure of image classes and that of semantic
classes information, we propose structure propagation (SP) for improving the
performance of ZSL for classification. SP can jointly consider the manifold
structure of image classes and that of semantic classes for approximating to
the intrinsic structure of object classes. Moreover, the SP can describe the
constrain condition between the compatibility function and these manifold
structures for balancing the influence of the structure propagation iteration.
The SP solution provides not only unseen class labels but also the relationship
of two manifold structures that encode the positive transfer in structure
propagation. Experimental results demonstrate that SP can attain the promising
results on the AwA, CUB, Dogs and SUN databases
Multi-Label Zero-Shot Learning with Structured Knowledge Graphs
In this paper, we propose a novel deep learning architecture for multi-label
zero-shot learning (ML-ZSL), which is able to predict multiple unseen class
labels for each input instance. Inspired by the way humans utilize semantic
knowledge between objects of interests, we propose a framework that
incorporates knowledge graphs for describing the relationships between multiple
labels. Our model learns an information propagation mechanism from the semantic
label space, which can be applied to model the interdependencies between seen
and unseen class labels. With such investigation of structured knowledge graphs
for visual reasoning, we show that our model can be applied for solving
multi-label classification and ML-ZSL tasks. Compared to state-of-the-art
approaches, comparable or improved performances can be achieved by our method.Comment: CVPR 201
Personalized Cinemagraphs using Semantic Understanding and Collaborative Learning
Cinemagraphs are a compelling way to convey dynamic aspects of a scene. In
these media, dynamic and still elements are juxtaposed to create an artistic
and narrative experience. Creating a high-quality, aesthetically pleasing
cinemagraph requires isolating objects in a semantically meaningful way and
then selecting good start times and looping periods for those objects to
minimize visual artifacts (such a tearing). To achieve this, we present a new
technique that uses object recognition and semantic segmentation as part of an
optimization method to automatically create cinemagraphs from videos that are
both visually appealing and semantically meaningful. Given a scene with
multiple objects, there are many cinemagraphs one could create. Our method
evaluates these multiple candidates and presents the best one, as determined by
a model trained to predict human preferences in a collaborative way. We
demonstrate the effectiveness of our approach with multiple results and a user
study.Comment: To appear in ICCV 2017. Total 17 pages including the supplementary
materia
Zero Shot Learning with the Isoperimetric Loss
We introduce the isoperimetric loss as a regularization criterion for
learning the map from a visual representation to a semantic embedding, to be
used to transfer knowledge to unknown classes in a zero-shot learning setting.
We use a pre-trained deep neural network model as a visual representation of
image data, a Word2Vec embedding of class labels, and linear maps between the
visual and semantic embedding spaces. However, the spaces themselves are not
linear, and we postulate the sample embedding to be populated by noisy samples
near otherwise smooth manifolds. We exploit the graph structure defined by the
sample points to regularize the estimates of the manifolds by inferring the
graph connectivity using a generalization of the isoperimetric inequalities
from Riemannian geometry to graphs. Surprisingly, this regularization alone,
paired with the simplest baseline model, outperforms the state-of-the-art among
fully automated methods in zero-shot learning benchmarks such as AwA and CUB.
This improvement is achieved solely by learning the structure of the underlying
spaces by imposing regularity.Comment: Accepted to AAAI-2
- …