46 research outputs found
Weakly-supervised Caricature Face Parsing through Domain Adaptation
A caricature is an artistic form of a person's picture in which certain
striking characteristics are abstracted or exaggerated in order to create a
humor or sarcasm effect. For numerous caricature related applications such as
attribute recognition and caricature editing, face parsing is an essential
pre-processing step that provides a complete facial structure understanding.
However, current state-of-the-art face parsing methods require large amounts of
labeled data on the pixel-level and such process for caricature is tedious and
labor-intensive. For real photos, there are numerous labeled datasets for face
parsing. Thus, we formulate caricature face parsing as a domain adaptation
problem, where real photos play the role of the source domain, adapting to the
target caricatures. Specifically, we first leverage a spatial transformer based
network to enable shape domain shifts. A feed-forward style transfer network is
then utilized to capture texture-level domain gaps. With these two steps, we
synthesize face caricatures from real photos, and thus we can use parsing
ground truths of the original photos to learn the parsing model. Experimental
results on the synthetic and real caricatures demonstrate the effectiveness of
the proposed domain adaptation algorithm. Code is available at:
https://github.com/ZJULearning/CariFaceParsing .Comment: Accepted in ICIP 2019, code and model are available at
https://github.com/ZJULearning/CariFaceParsin
Generative Face Completion
In this paper, we propose an effective face completion algorithm using a deep
generative model. Different from well-studied background completion, the face
completion task is more challenging as it often requires to generate
semantically new pixels for the missing key components (e.g., eyes and mouths)
that contain large appearance variations. Unlike existing nonparametric
algorithms that search for patches to synthesize, our algorithm directly
generates contents for missing regions based on a neural network. The model is
trained with a combination of a reconstruction loss, two adversarial losses and
a semantic parsing loss, which ensures pixel faithfulness and local-global
contents consistency. With extensive experimental results, we demonstrate
qualitatively and quantitatively that our model is able to deal with a large
area of missing pixels in arbitrary shapes and generate realistic face
completion results.Comment: Accepted by CVPR 201
Error Correction for Dense Semantic Image Labeling
Pixelwise semantic image labeling is an important, yet challenging, task with
many applications. Typical approaches to tackle this problem involve either the
training of deep networks on vast amounts of images to directly infer the
labels or the use of probabilistic graphical models to jointly model the
dependencies of the input (i.e. images) and output (i.e. labels). Yet, the
former approaches do not capture the structure of the output labels, which is
crucial for the performance of dense labeling, and the latter rely on carefully
hand-designed priors that require costly parameter tuning via optimization
techniques, which in turn leads to long inference times. To alleviate these
restrictions, we explore how to arrive at dense semantic pixel labels given
both the input image and an initial estimate of the output labels. We propose a
parallel architecture that: 1) exploits the context information through a
LabelPropagation network to propagate correct labels from nearby pixels to
improve the object boundaries, 2) uses a LabelReplacement network to directly
replace possibly erroneous, initial labels with new ones, and 3) combines the
different intermediate results via a Fusion network to obtain the final
per-pixel label. We experimentally validate our approach on two different
datasets for the semantic segmentation and face parsing tasks respectively,
where we show improvements over the state-of-the-art. We also provide both a
quantitative and qualitative analysis of the generated results
On Face Segmentation, Face Swapping, and Face Perception
We show that even when face images are unconstrained and arbitrarily paired,
face swapping between them is actually quite simple. To this end, we make the
following contributions. (a) Instead of tailoring systems for face
segmentation, as others previously proposed, we show that a standard fully
convolutional network (FCN) can achieve remarkably fast and accurate
segmentations, provided that it is trained on a rich enough example set. For
this purpose, we describe novel data collection and generation routines which
provide challenging segmented face examples. (b) We use our segmentations to
enable robust face swapping under unprecedented conditions. (c) Unlike previous
work, our swapping is robust enough to allow for extensive quantitative tests.
To this end, we use the Labeled Faces in the Wild (LFW) benchmark and measure
the effect of intra- and inter-subject face swapping on recognition. We show
that our intra-subject swapped faces remain as recognizable as their sources,
testifying to the effectiveness of our method. In line with well known
perceptual studies, we show that better face swapping produces less
recognizable inter-subject results. This is the first time this effect was
quantitatively demonstrated for machine vision systems