563 research outputs found
Interspecies Knowledge Transfer for Facial Keypoint Detection
We present a method for localizing facial keypoints on animals by
transferring knowledge gained from human faces. Instead of directly finetuning
a network trained to detect keypoints on human faces to animal faces (which is
sub-optimal since human and animal faces can look quite different), we propose
to first adapt the animal images to the pre-trained human detection network by
correcting for the differences in animal and human face shape. We first find
the nearest human neighbors for each animal image using an unsupervised shape
matching method. We use these matches to train a thin plate spline warping
network to warp each animal face to look more human-like. The warping network
is then jointly finetuned with a pre-trained human facial keypoint detection
network using an animal dataset. We demonstrate state-of-the-art results on
both horse and sheep facial keypoint detection, and significant improvement
over simple finetuning, especially when training data is scarce. Additionally,
we present a new dataset with 3717 images with horse face and facial keypoint
annotations.Comment: CVPR 2017 Camera Read
Content-Adaptive Sketch Portrait Generation by Decompositional Representation Learning
Sketch portrait generation benefits a wide range of applications such as
digital entertainment and law enforcement. Although plenty of efforts have been
dedicated to this task, several issues still remain unsolved for generating
vivid and detail-preserving personal sketch portraits. For example, quite a few
artifacts may exist in synthesizing hairpins and glasses, and textural details
may be lost in the regions of hair or mustache. Moreover, the generalization
ability of current systems is somewhat limited since they usually require
elaborately collecting a dictionary of examples or carefully tuning
features/components. In this paper, we present a novel representation learning
framework that generates an end-to-end photo-sketch mapping through structure
and texture decomposition. In the training stage, we first decompose the input
face photo into different components according to their representational
contents (i.e., structural and textural parts) by using a pre-trained
Convolutional Neural Network (CNN). Then, we utilize a Branched Fully
Convolutional Neural Network (BFCN) for learning structural and textural
representations, respectively. In addition, we design a Sorted Matching Mean
Square Error (SM-MSE) metric to measure texture patterns in the loss function.
In the stage of sketch rendering, our approach automatically generates
structural and textural representations for the input photo and produces the
final result via a probabilistic fusion scheme. Extensive experiments on
several challenging benchmarks suggest that our approach outperforms
example-based synthesis algorithms in terms of both perceptual and objective
metrics. In addition, the proposed method also has better generalization
ability across dataset without additional training.Comment: Published in TIP 201
Facial landmark detection via attention-adaptive deep network
Facial landmark detection is a key component of the face recognition pipeline as well as facial attribute analysis and face verification. Recently convolutional neural network-based face alignment methods have achieved significant improvement, but occlusion is still a major source of a hurdle to achieve good accuracy. In this paper, we introduce the attentioned distillation module in our previous work Occlusion-adaptive Deep Network (ODN) model, to improve performance. In this model, the occlusion probability of each position in high-level features are inferred by a distillation module. It can be learnt automatically in the process of estimating the relationship between facial appearance and facial shape. The occlusion probability serves as the adaptive weight on high-level features to reduce the impact of occlusion and obtain clean feature representation. Nevertheless, the clean feature representation cannot represent the holistic face due to the missing semantic features. To obtain exhaustive and complete feature representation, it is vital that we leverage a low-rank learning module to recover lost features. Considering that facial geometric characteristics are conducive to the low-rank module to recover lost features, the role of the geometry-aware module is, to excavate geometric relationships between different facial components. The role of attentioned distillation module is, to get rich feature representation and model occlusion. To improve feature representation, we used channel-wise attention and spatial attention. Experimental results show that our method performs better than existing methods
- …