4,237 research outputs found
PANDA: Pose Aligned Networks for Deep Attribute Modeling
We propose a method for inferring human attributes (such as gender, hair
style, clothes style, expression, action) from images of people under large
variation of viewpoint, pose, appearance, articulation and occlusion.
Convolutional Neural Nets (CNN) have been shown to perform very well on large
scale object recognition problems. In the context of attribute classification,
however, the signal is often subtle and it may cover only a small part of the
image, while the image is dominated by the effects of pose and viewpoint.
Discounting for pose variation would require training on very large labeled
datasets which are not presently available. Part-based models, such as poselets
and DPM have been shown to perform well for this problem but they are limited
by shallow low-level features. We propose a new method which combines
part-based models and deep learning by training pose-normalized CNNs. We show
substantial improvement vs. state-of-the-art methods on challenging attribute
classification tasks in unconstrained settings. Experiments confirm that our
method outperforms both the best part-based methods on this problem and
conventional CNNs trained on the full bounding box of the person.Comment: 8 page
CIAGAN: Conditional Identity Anonymization Generative Adversarial Networks
The unprecedented increase in the usage of computer vision technology in
society goes hand in hand with an increased concern in data privacy. In many
real-world scenarios like people tracking or action recognition, it is
important to be able to process the data while taking careful consideration in
protecting people's identity. We propose and develop CIAGAN, a model for image
and video anonymization based on conditional generative adversarial networks.
Our model is able to remove the identifying characteristics of faces and bodies
while producing high-quality images and videos that can be used for any
computer vision task, such as detection or tracking. Unlike previous methods,
we have full control over the de-identification (anonymization) procedure,
ensuring both anonymization as well as diversity. We compare our method to
several baselines and achieve state-of-the-art results.Comment: CVPR 202
Real-time Convolutional Neural Networks for Emotion and Gender Classification
In this paper we propose an implement a general convolutional neural network
(CNN) building framework for designing real-time CNNs. We validate our models
by creating a real-time vision system which accomplishes the tasks of face
detection, gender classification and emotion classification simultaneously in
one blended step using our proposed CNN architecture. After presenting the
details of the training procedure setup we proceed to evaluate on standard
benchmark sets. We report accuracies of 96% in the IMDB gender dataset and 66%
in the FER-2013 emotion dataset. Along with this we also introduced the very
recent real-time enabled guided back-propagation visualization technique.
Guided back-propagation uncovers the dynamics of the weight changes and
evaluates the learned features. We argue that the careful implementation of
modern CNN architectures, the use of the current regularization methods and the
visualization of previously hidden features are necessary in order to reduce
the gap between slow performances and real-time architectures. Our system has
been validated by its deployment on a Care-O-bot 3 robot used during
RoboCup@Home competitions. All our code, demos and pre-trained architectures
have been released under an open-source license in our public repository.Comment: Submitted to ICRA 201
Learning Residual Images for Face Attribute Manipulation
Face attributes are interesting due to their detailed description of human
faces. Unlike prior researches working on attribute prediction, we address an
inverse and more challenging problem called face attribute manipulation which
aims at modifying a face image according to a given attribute value. Instead of
manipulating the whole image, we propose to learn the corresponding residual
image defined as the difference between images before and after the
manipulation. In this way, the manipulation can be operated efficiently with
modest pixel modification. The framework of our approach is based on the
Generative Adversarial Network. It consists of two image transformation
networks and a discriminative network. The transformation networks are
responsible for the attribute manipulation and its dual operation and the
discriminative network is used to distinguish the generated images from real
images. We also apply dual learning to allow transformation networks to learn
from each other. Experiments show that residual images can be effectively
learned and used for attribute manipulations. The generated images remain most
of the details in attribute-irrelevant areas
Persistent Evidence of Local Image Properties in Generic ConvNets
Supervised training of a convolutional network for object classification
should make explicit any information related to the class of objects and
disregard any auxiliary information associated with the capture of the image or
the variation within the object class. Does this happen in practice? Although
this seems to pertain to the very final layers in the network, if we look at
earlier layers we find that this is not the case. Surprisingly, strong spatial
information is implicit. This paper addresses this, in particular, exploiting
the image representation at the first fully connected layer, i.e. the global
image descriptor which has been recently shown to be most effective in a range
of visual recognition tasks. We empirically demonstrate evidences for the
finding in the contexts of four different tasks: 2d landmark detection, 2d
object keypoints prediction, estimation of the RGB values of input image, and
recovery of semantic label of each pixel. We base our investigation on a simple
framework with ridge rigression commonly across these tasks, and show results
which all support our insight. Such spatial information can be used for
computing correspondence of landmarks to a good accuracy, but should
potentially be useful for improving the training of the convolutional nets for
classification purposes
- …