3,122 research outputs found
Leveraging Mid-Level Deep Representations For Predicting Face Attributes in the Wild
Predicting facial attributes from faces in the wild is very challenging due
to pose and lighting variations in the real world. The key to this problem is
to build proper feature representations to cope with these unfavourable
conditions. Given the success of Convolutional Neural Network (CNN) in image
classification, the high-level CNN feature, as an intuitive and reasonable
choice, has been widely utilized for this problem. In this paper, however, we
consider the mid-level CNN features as an alternative to the high-level ones
for attribute prediction. This is based on the observation that face attributes
are different: some of them are locally oriented while others are globally
defined. Our investigations reveal that the mid-level deep representations
outperform the prediction accuracy achieved by the (fine-tuned) high-level
abstractions. We empirically demonstrate that the midlevel representations
achieve state-of-the-art prediction performance on CelebA and LFWA datasets.
Our investigations also show that by utilizing the mid-level representations
one can employ a single deep network to achieve both face recognition and
attribute prediction.Comment: In proceedings of 2016 International Conference on Image Processing
(ICIP
Face Attribute Prediction Using Off-the-Shelf CNN Features
Predicting attributes from face images in the wild is a challenging computer
vision problem. To automatically describe face attributes from face containing
images, traditionally one needs to cascade three technical blocks --- face
localization, facial descriptor construction, and attribute classification ---
in a pipeline. As a typical classification problem, face attribute prediction
has been addressed using deep learning. Current state-of-the-art performance
was achieved by using two cascaded Convolutional Neural Networks (CNNs), which
were specifically trained to learn face localization and attribute description.
In this paper, we experiment with an alternative way of employing the power of
deep representations from CNNs. Combining with conventional face localization
techniques, we use off-the-shelf architectures trained for face recognition to
build facial descriptors. Recognizing that the describable face attributes are
diverse, our face descriptors are constructed from different levels of the CNNs
for different attributes to best facilitate face attribute prediction.
Experiments on two large datasets, LFWA and CelebA, show that our approach is
entirely comparable to the state-of-the-art. Our findings not only demonstrate
an efficient face attribute prediction approach, but also raise an important
question: how to leverage the power of off-the-shelf CNN representations for
novel tasks.Comment: In proceeding of 2016 International Conference on Biometrics (ICB
Pose Induction for Novel Object Categories
We address the task of predicting pose for objects of unannotated object
categories from a small seed set of annotated object classes. We present a
generalized classifier that can reliably induce pose given a single instance of
a novel category. In case of availability of a large collection of novel
instances, our approach then jointly reasons over all instances to improve the
initial estimates. We empirically validate the various components of our
algorithm and quantitatively show that our method produces reliable pose
estimates. We also show qualitative results on a diverse set of classes and
further demonstrate the applicability of our system for learning shape models
of novel object classes
PANDA: Pose Aligned Networks for Deep Attribute Modeling
We propose a method for inferring human attributes (such as gender, hair
style, clothes style, expression, action) from images of people under large
variation of viewpoint, pose, appearance, articulation and occlusion.
Convolutional Neural Nets (CNN) have been shown to perform very well on large
scale object recognition problems. In the context of attribute classification,
however, the signal is often subtle and it may cover only a small part of the
image, while the image is dominated by the effects of pose and viewpoint.
Discounting for pose variation would require training on very large labeled
datasets which are not presently available. Part-based models, such as poselets
and DPM have been shown to perform well for this problem but they are limited
by shallow low-level features. We propose a new method which combines
part-based models and deep learning by training pose-normalized CNNs. We show
substantial improvement vs. state-of-the-art methods on challenging attribute
classification tasks in unconstrained settings. Experiments confirm that our
method outperforms both the best part-based methods on this problem and
conventional CNNs trained on the full bounding box of the person.Comment: 8 page
- …